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NUMERICAL STUDIES IN THE SEQUENTIAL ESTIMATION OF 
A BINOMIAL PARAMETER 


By P. ARMITAGE 


Statistical Research Unit of the Medical Research Council, 
: London School of Hygiene and Tropical Medicine* 


1. INTRODUCTION 


1-1. The literature on sequential estimation procedures is rather scanty. Some authors 
such as Haldane (1945) have investigated the problem of est imation when sampling is 
conducted according to particular stopping rules, and Girshick, Mosteller & Savage (1946) 
have given a general method for obtaining unbiased estimates of a binomial parameter for 
fairly general stopping rules. Anscombe (1953) has proposed certain stopping rules for 
which fixed-sample-size formulae are asymptotically valid, and Ray (1957) hasinvestigated 
the small-sample properties of some similar procedures. Cox (1952) has considered asymp- 
totie properties of sequential estimation procedures for general boundaries. The problem 
of obtaining exact estimation methods for general stopping rules, or even for the most 
widely known system of sequential designs—that due to Wald—is still unsolved. 

Wald’s system was not intended primarily as a means of estimation. Nevertheless, it 
may frequently happen that an investigator chooses a Wald-type procedure, or some other, 
for good reasons, and then wishes to carry out some sort of estimation procedure for the 
parameter. How misleading will it be to use fixed-sample-size formulae, in spite of the fact 
that the observations were made sequentially ? 

The unbiased estimator given by Girshick et al. (1946) may, in particular instances, differ 
appreciably from that of the maximum likelihood estimator which would normally be used 
in fixed-size sampling. But unbiasedness is not a property necessarily required of an 
estimator, and it would be interesting to know what degree of bias is associated with the 
usual estimator. 

An analytical approach to this problem does not appear to be simple, since it involves a 
knowledge of sample-space distributions which is not in general available in a simple form. 
For binomial sampling with any particular boundaries these distributions can be computed 
exactly, though perhaps laboriously, and hence many questions like those discussed above 
can be answered unambiguously. In this paper results are presented for three particular 
designs: two truncated Wald schemes and a ‘restricted’ procedure of the type described by 
Armitage (1957). The extent to which one can generalize from asample of three is, of course, 
debatable, but the results for the three procedures have some features in common, which 
are discussed in the final section of the paper. 

For each design, two questions are investigated: the distributions of the maximum 
likelihood estimator and of the unbiased estimator, for various values of the parameter; 
and secondly, the establishment of confidence intervals for the unknown parameter. These 
two aspects of the investigation are discussed separately in the following sections. 


* This work was completed, and the paper written, while the author was a visiting scientist at the 


National Institutes of Health, Bethesda, Maryland, U.S.A. 
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1-2. Bias. If r successes have been observed out of n binomial trials, the maximum 
likelihood estimator of the unknown probability of a success, %, is Ê = r|n, irrespective of 
the stopping rules under which the observations have been obtained. For fixed-sizesampling, 
6 is unbiased. For any particular sequential stopping rule, Ó may be expected in general 
to be biased. A procedure for obtaining an unbiased estimator, for a wide class of sequential 
stopping rules, has been given by Girshick et al. (1946). This is given by b, = N'|N, where 
N is the number of permissible orders (i.e. those not prohibited by a stopping rule) in whieh 
the observed results could have been obtained, and N' is the number of permissible orders 
subject to the restriction that the first observation is a success. 

For particular sequential boundaries, N' and N can be obtained by enumeration of the 
number of admissible paths from the points (I, 1) and (0,0) to the point (n, r) on a lattice 
diagram in which the numbers of trials and successes are plotted as abscissa and ordinate, 
respectively. For designs with linear boundaries, the enumeration can conveniently be 
effected by matrix multiplication (Stockman & Armitage, 1946). 

Although Ó, is unbiased, its distribution may conceivably have features making it less 
satisfactory than the biased maximum likelihood estimator, Ô. The distributions of Ô and 4, 
have been obtained by calculation of the exact probability of reaching each boundary point, 
for various values of 0. For the boundary point (n,, ro), this probability is N0"(1—@)"»~. 
Probabilities were calculated to eight decimal places. The computations were to some 
extent recursive and were subject to cumulative errors, but the totals over all boundary 
points differed from unity by at most 1 unit in the seventh decimal place in Examples 1 
and 3, and at most 5 in the fifth place in Example 2. 

1:3. Confidence intervals. If ry successes out of n, are observed in fixed-size sampling, 
a central confidence interval with confidence coefficient 1 2) is customarily obtained as 


(8, 6), where P(rz r, | ) = P(r <r, | 0) = y. In repeated sampling from a population with 


any value 6, the probability that the statement « 0 « 0' is true is at least 1 — 2y. 

In repeated sampling according to any prescribed sequential stopping rule, the pro- 
bability statement made in the last sentence is no longer necessarily true. Even if, for some 
particular 0, the probability that 0 is included in the interval is greater than the nominal 
1—2y, its value will in general differ from that given by fixed-size sampling. The first 
question, then, is: what is the probability that the ‘classical’ confidence interval includes 0, 
for repeated sampling under these sequential stopping rules? Once the probability dis- 
tributions over the boundary points have been calculated, as indicated in $ 1-2, this question 
can easily be answered, for the classical confidence interval can be obtained by interpolation 
in tables of the cumulative binomial distribution (National Bureau of Standards, 1949; 
Romig, 1953). 

The second question in this connexion is how to set up, for a given sequential design, a 
system of confidence intervals satisfying the required probability statement in repetitions 
of the sequential sampling. This can be done as follows. In any sequential binomial pro- 
cedure with fixed boundary points, the latter may be ordered in terms of increasing 0, 
points with identical values of Ê being arranged in some arbitrary order. For some designs 
this ordering will correspond to a natural ordering in terms of proximity in the (n, r) plane. 
For other designs an ordering in terms of proximity may involve slight departures from one 
in terms of increasing Ê. For any two boundary points, B, and B, we shall write B, > Bs 
or B, < B, according as B, follows or precedes B, respectively, in an ordering which corre- 
sponds to increasing 4 (apart from possible slight departures of the type referred to above). 
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Then, the sequential confidence interval for with coefficient 1 — 2y, at any point B,, is 
defined as (f, 0^), where P(B> B, | @’) = P(B« B,| 9) = y. That this system satisfies the 
required probability statement clearly follows by an argument analogous to that used in 
deriving the classical interval, provided that the distribution funetion over the boundary 
points is a monotonic function of 0. The latter condition will not necessarily be fulfilled for 
any ordering of the type described above, but it appears to be fulfilled in the particular 
instances considered below. A similar method may be followed to obtain sequential con- 
fidence intervals for other than binomial sampling, but only the binomial case will be 
considered in detail here. 

Sufficient conditions for the monotonicity of the distribution function with regard to 0 
(and hence for the existence of a set of confidence intervals) are that (a) the boundary 
points are ordered in terms of 9; and (b) the probability is unity that sampling will end at 
one of the boundary points, for all 8. 

For, let the boundary points be (n, r;) (i = 1,2, ..., k), define 6, = r/n; and denote the 
values of N by N;. If (a) is satisfied, we have 

0% b d= 1, 2, E-). (1) 
The distribution function P(s, 0) will be defined as the sum of the probabilities, given 6, 
of reaching the points (, r;) fori = 1,2,...,8. Then, if (b) is satisfied, P(s, 0) can be written 
in either of two forms (for s = 1,2, . ., k—1) 


P = $ NI -h, (2) 
i-1 
=l- 5 N,0n(1— G0. (3) 
i=s+1 
Hence oPJo0 = Y, Nec - 0er; n), (4) 
i-1 
-— 5 MS- AI- (r;— n0). (5) 


i=s+1 
By (1) and (4), for 6,<0<1, 0P[00 < 0; by (1) and (5), for 0 < 0<6,,1, OP [00 < 0. This proves 
that @P/0 < 0 for all 0 in (O, 1), except possibly in two situations: (i) for 0 = 6, in the case 
6, = but in this case aP/20 < 0, the equality holding only if 6, is constant for all i, 
and this possibility is excluded by condition (b); (ii) fors = k; here P(s, 0) = 1 and 9P[00 = 0; 
this exception is trivial and raises no difficulties. 

Conditions (a) and (b) are sufficient for monotonicity, but not necessary. Simple examples 
may be constructed, in which the boundary points are not ordered strictly in terms of 0, 
but for which, nevertheless, the distribution functions are monotonic. 

Tn the examples below, the distributions over the sequential boundary points have been 
obtained for suitable values of 6, and the limits 9’ and 6’ have been obtained for each boun- 
dary point by interpolation. Tt is thus possible to compare the two sets of intervals, which 
for convenience we shall call ‘classical’ and ‘sequential’, and observe in what respects they 
differ. In addition, I have calculated for various values of 0 the probability that the true 


value is excluded from each set of intervals. This gives us some idea of the extent to which 


we may be misled, in using classical limits, when the sampling is sequential. ; 
Interpolation in P with respect to 6 can conveniently be done after transformation of 
both P and Ó to equivalent normal deviates. Linear interpolation on the transformed 


r2 
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variables has usually been found to be reliable. Low values of @ and g“, or high values of 
0 and 6’, which occur at a few of the extreme boundary points, could not be obtained by 
interpolation, and were calculated to the required accuracy by exact or iterative solution 
of the appropriate algebraie equation. The tabulated values of the confidence intervals, 
and those of the unbiased estimates, may occasionally be inaccurate by one unit in the last 
digit quoted, owing to rounding-off errors. 


2. EXAMPLE I 


Consider a Wald probability ratio sequential test, designed to distinguish between two 
values of a binomial parameter: % = 0-50 (with a probability of error of the first kind, 
a = 0:025), and 0, = 0-92 (with a probability of error of the second kind, f = 0-05). The two 
linear boundaries have equations 

r=  1489--0-750n, 


and r = —1-216 + 0-750n. 


For convenience we shall round off the coefficients, and consider the closely similar pair of 
boundaries with equations r= 1:50-+0-75n, 


and r= 125 7 075m. 


The maximum value of the average sample number is about 10 (according to the usual 
approximating formula) and we might expect the operating characteristic to be affected | 
to a fairly small degree if the procedure is truncated at a sample size of about 30 or more. 
We have truncated at n = 40, and the two boundary points with n = 40 have been allocated 
one to each boundary. The co-ordinates of the boundary points are shown in the first two 
columns of Table 1. The points are arranged, for convenience, in order of increasing or 
decreasing n on each boundary, rather than increasing J. The boundaries are illustrated 
in Fig. 1. 

The third and fourth columns of Table 1 show the maximum likelihood and unbiased 
estimates of 0, respectively. An unexpected feature is the rapidity with which the unbiased 
estimator approaches a value of about 0-80 as n increases; for all n> 15, 0, lies between 
0-799 and 0-801. Table 2 shows the mean, variance, and mean-square error about 0, of the 
two estimates, for various values of 0. Notice first that 0 is biased towards 0 or 1, away from 
an intermediate value between 0-8 and 0-9. This provides a reason for the phenomenon 
noticed above, that Ô, differs from Ó in being brought closer to a value of about 0-80. A con- 
sequence of this situation is that for 0 = 0-7, 0-8 and 0-9 (for which values there is a relatively 
high chance of terminating at high values of n, where Ô, is fairly constant), the variance of 
Ô, is less than that of Ô. For more extreme values of 0 (0-60 and less, or 0:95 and greater), 
the variance of à, is greater than that of Ô, and (except for 0 = 0-6) greater even than the 
mean-square error of Ô, 

Girshick et al. (1946) point out that if the region inside the boundaries has a narrow throat, 
with only one accessible point for some value of n, then 6, will be constant for all boundary 
points with higher values of n. In the present example the width of the accessible region 
(for n < 38) varies between 2 and 3 points, but apparently the region is sufficiently narrow 
to produce a similar effect. The unbiased estimator, incidentally, is unique since the region 
is ‘simple’, in the sense of Girshick et al. (1946). 
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For calculation of confidence limits, the boundary points have been arranged, for con- 
venience, in the order shown in Table 1, which coincides with the ordering in terms of 
increasing Hi, but differs slightly from that in terms of increasing 4. The probability dis- 
tributions over the boundary points were calculated for @ = 0-30 (0-05) 0-95, and for the 
additional values 0-475, 0-525, 0-975. The two sets of confidence limits, ‘classical’ and 
‘sequential’, were calculated as described above, for confidence coefficients of 0-90 and 
0-95. These are given in Table 1, and for the latter, visually in Fig. 2. The sequential limits 
tend to be wider apart than the classical limits, for the higher values of n. For lower values 
of n, both the upper and the lower sequential limits are displaced relative to the classical 
limits, being higher for low values of 6 and lower for high values of Ô. In other words, they 
are displaced towards the fairly constant values which they assume at high values of n. 
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Fig. 1. Boundaries for Example 1. Fig. 2. 95 % confidence limits for 0 (Example 1). 
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The probabilities of exclusion of the true value, 0, from each set of intervals, are given in 
Table 2, for various values of 0. We encounter here two difficulties familiar in problems of 
confidence intervals with discrete distributions: the probabilities of exclusion above 0 or 
below “ are in general less than the nominal value y; and exclusion above 6’, or below H, 
is impossible for sufficiently low, or high, values of 0, respectively. However, Table 2 shows 
that the probability of exclusion beyond one of the classical limits can be greater than the 
nominal value y, at least for values of 0 near the middle of the range; and that it tends to 
be considerably less than y for extremely high or extremely low values of 0. These results 
are in line with the general nature of the differences between the two set of intervals, com- 
mented on above. 

Tt may be of interest to note that the exact probabilities of reaching the upper boundary, 
for = 0-5 and 0-925 (which is close to the value of 0-92, initially considered) are respectively 
0:0250 (agreeing well with the nominal value of 0:025), and 0-9729 (as compared with the 
nominal 0-95). 

3. EXAMPLE 2 
The design used here was intended as a truncated version of a probability ratio sequential 
test to distinguish between the hypotheses that 9 = 0, = 0-05 and = 0, = 0-15, with equal 
probabilities of error of the first and second kinds (x = £ = 0-05). The scheme is truncated 
at n = 70. After rounding off the coefficients, the equations of the boundaries were taken 
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to be r = 2:5--0-10n and r = —2:5-- 0-10n. The rounded-off values are actually fairly close 
to those appropriate for ho = 0-05, 0, = 0-17,a = f = 0:03. The co-ordinates of the boundary 
points are given in Table 4, and the boundaries are illustrated in Fig. 3. 

In this example, no calculations of unbiased estimates were made. For calculation 
of 95% confidence limits (x = 0-025), the boundary points were arranged in the order 
shown in Table 3, which again differs to some extent from the ordering in terms of in- 
creasing Ê. The probability distributions over the boundary points were calculated for 
@ = 0-03 (0-02) 0-21, and, on the upper boundary only, for 0 = 0-3(0-1)0-8. The two sets 
of limits are given in Table 4 and are shown in Fig. 4. They are considerably less discrepant 
than was the case in Example 1. There is no marked tendency for the sequential limits to 
be wider apart than the classical limits, for large n. There is, however, the same relative 
displacement noted in Example 1, but to a much smaller degree; that is, the sequential 
limits are usually higher than the classical limits on the lower boundary, and lower on the 
upper boundary. Table 3 shows that for some values of 0 the probability that the interval 
excludes 0, in a specified direction, may be appreciably higher than the specified upper | 
bound of 0-025. | 
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Table 3 also gives the mean and variance of the maximum likelihood estimator, Ô, for 
the same values of 0. The bias in 0 is of the same character as that described in Example 1: 
for the lower values of 0, Ô is biased downwards, while for the higher values the bias is 
upwards. 

We note finally that the exact probabilities of reaching the upper boundary, for 0 = 0:05 
and 0-17, are respectively 0:031 (corresponding to the nominal value of 0-03), and 0:922 
(corresponding to the nominal value of 0-97). The approximate formulae are apparently 
rather misleading here. 


4. EXAMPLE 3 


This is one of the ‘restricted sequential procedures’ described by Armitage (1957). The 
maximum number of observations is 44. On the upper boundary the ratio of the likelihoods 
of 0 = 0-8 and @ = 0-5 is constant; on the lower boundary the likelihood ratio of 0 = 0°5 
to 0 = 0-2 is constant. The procedure has the properties that when 0 = 0-5 the probabilities 
of reaching each of the two outer boundaries are 0-020; and when @ = 0-8 (0-2) the pro- 
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bability of reaching the upper (lower) boundary is 0-947. The boundary points are given in 
Table 5 and illustrated in Fig. 5 (cf. also Fig. 2 of Armitage (1957), where different co- 
ordinates are used). Since the design is symmetrical, only the boundary points with r < jn 
are tabulated. Thus, the upper portion of the middle boundary consists of the points 
n = 27,28,...,44; r = n— 13. The upper boundary consists of the points n = 8, 11, ..., 44; 
r = (8+2n)/3. 

The unbiased estimates are displaced, relative to the maximum likelihood estimates, 
towards a value of about 0-375. Similarly, for the boundary points not shown in Table 5, 
the displacement is towards a value of 0-625. Table 6 shows that the maximum likelihood 
estimator, Ô, is biased away from the value 0-5, at least at those values of 0 for which com- 
putations have been carried out. The unbiased estimator, Ô, has larger variance and larger 
mean-square error than 6, for a range of values of @ furthest from 0-5, namely, 0-2 or less, 
and 0-8 or greater. 
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Fig. 5. Boundaries for Example 3. Fig. 6. 95 % confidence limits for 0 (Example 3). 
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—.—.—, maximum likelihood estimate. 


The region of accessible points in this restricted procedure is not ‘simple’ in the sense of 
Girshick e£ al. (1946) and the unbiased estimator is therefore not unique. However, alter- 
native unbiased estimators differ only in the values assigned to the three central boundary 
points of the middle boundary. 

The two sets of 95 % confidence limits are given in Table 5 and illustrated in Fig. 6. Asin 
Example 2, the widths of the two intervals at any boundary point are fairly similar. As in 
the previous examples the sequential interval tends to be displaced upwards, relative to the 
classical interval, at low values of Ô. For points on the middle boundary with 0:35 < Ó « 0-50, 
on the other hand, the mid-point of the sequential interval is displaced away from 0:5, 
relative to that of the classical interval. The probabilities of exclusion are given in Table 6. 
For the classical intervals these probabilities just exceed the nominal value of 0-025, when 
0 = 0-5, but otherwise remain less than or equal to the probabilities for the sequential 
limits. ' 
5. DISCUSSION 
Examples 1 and 2 both involve parallel-line boundaries. In each example two features 
were observed: for extreme values of 0, the maximum likelihood estimator 6 is biased (in a 
more extreme direction); and for boundary points corresponding to extreme values of Othe 
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sequential confidence limits are displaced in relation to the classical limits (in a less 
extreme direction). For sufficiently high or sufficiently low values of 6, the probability of 
reaching one boundary is much higher than that of reaching the other, and (as Cox (1952) 
remarks) the properties of the double-boundary procedure should be similar to those of 
a single-boundary procedure. 

Cox (1952) has shown that in binomial sampling with a single linear boundary, Ó is 
asymptotically biased in the direction observed above. That is, if the equation to the 
boundary is r = a+bn (a>0,b> 0), then Ê is biased upwards, for 0 >b. This result may be 
obtained also by considering the appropriate diffusion approximation. 

As was pointed out in $2, a natural consequence of the direction of the bias in Ó is that 
Ô, tends to be displaced towards a central value. For values of 0 sufficiently close to this 
central value, Â, tends to have a smaller mean-square error than has H. The values of 0 for 
which the reverse situation holds are all sufficiently near to 0 or 1 to yield a high probability 
of hitting a particular one of the boundaries (one of the outer boundaries in Example 3). 
Indeed, the values of 0 for which Ó and 6, have equal mean-square errors appear to be fairly 
close (in both Examples 1 and 3) to the values of 0 appearing in the specification of the 
boundaries (0-50 and 0-92 in Example 1, and 0-2 and 0-8 in Example 3); but there is no 
evident reason why this should generally be true. 

The other observation, of the direction of displacement of the confidence intervals, can 
also be verified for single-boundary sampling, by the following direct argument. Let the 
equation to the boundary be r=a+bn (a7 0,0 0), and let B be the boundary point 
(no, To), where 6, = rono. Suppose that a sample path, when continued sufficiently far, 
crosses the sequential boundary at a point where Ô = 6, and the fixed-sample-size boundary 
n = m at a point where Ô = 2 Assume that 0 is a decreasing function of n. Then, any path 
with 0,» o must have crossed the sequential boundary at n< ne, which implies Ô, S Ho- 
Hence, Ret d re 
P(0,7 0o) > P(O;>%), 


and PÔ, 2 A) > P(0,2 £,). 
It follows that 0' « 0 and 6’ < 0, where, as before (0, 0) are the fixed-sample-size limits and 
(, 0") are the sequential limits. 

These results apply similarly to the lower boundary; thus, for 0 < b, 0 is biased downwards, 
0' » and 6’ > 0. 

In Example 1; but not appreciably in Example 2, the sequential limits are wider apart 
than the classical limits, for boundary points corresponding to high values of n. There is 
reason to believe that this will be a general finding if truncation is performed at sufficiently 
high values of n. For, consider a Wald procedure to distinguish between the hypotheses 
that 0 = 0, and 0 = 0,, with both probabilities of error equal to y. At points infinitely far 
up either boundary the 100(1 — 2y) % sequential confidence interval will be (00, Hi), whereas 
the width of the classical interval will tend to zero as n->0o. It seems a reasonable con- 
jecture that the probability of exclusion from the classical interval will exceed that from the 
sequential interval when @ is close to b. The position is somewhat obscured in our exam ples 
by the effects of discontinuity, but what evidence there is confirms the conjecture. 

It should be noted that sets of central confidence intervals exist, other than those given 
by the method described in this paper, and some of these may more closely resemble the 
classical limits than do those considered here. 
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Example 3 differs from Examples 1 and 2 in having a design with two ‘channels’, which 
rather confuses the problem for intermediate values of 0. However, for extremely high (or 
extremely low) values of , only the upper (or lower) boundary need be considered, and the 
situation is similar to that discussed in relation to the first two examples. We find that a 
is biased away from 0-5, and the sequential confidence limits are displaced towards 0-5. 
The close similarity between the two systems of limits in this example is interesting. It 
may be a reflexion of the fact that for all except very extreme values of 0 there is a fairly 
high probability of reaching the middle boundary, where the variability in n is relatively 
small. This type of procedure was developed in an attempt to avoid the very high variability 
in sample number which is associated with parallel-line procedures. 


SUMMARY 


A method is described of obtaining confidence limits for a binomial probability, for a class 
of sequential procedures with fixed boundaries. Confidence limits, and unbiased estimates 
of the parameter, have been calculated for all the boundary points in three closed sequential 
designs: two truncated Wald procedures, and one ‘restricted’ procedure. Reasons are 
suggested for two observed tendencies: at boundary points which are reached after a small 
number of observations, corresponding to high or low values of the estimated probability, 
the sequential confidence limits are shifted in a less extreme direction relative to the limits 
given by the usual fixed-sample-size formulae; for extreme values of the parameter the 
maximum likelihood estimator is biased in a more extreme direction, and the unbiased 
estimator is correspondingly shifted in a less extreme direction. If limits are based on 
fixed-sample-size formulae, the probability of exclusion of the true value does not appear 
to differ grossly from the nominal value. The ‘classical’ and sequential limits are particularly 
close for the ‘restricted’ design. 


Iam indebted to Miss Irene Allen for computational assistance. 
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Sequential estimation of a binomial parameter 


Table 1. Boundary points, maximum likelihood and unbiased estimates of 0, 
and 90 and 95 % confidence limits for 0 (Example 1) 


Estimates of 0 90 % confidence limits 95 % confidence limits 
p a Maximum 
likelihood Unbiased Classical Sequential Classical Sequenti 
6 ô, 
Lower boundary 
2 0 0-000 0-000 — 0:78 — 0-78 — 0-84 — 
3 1 333 -500 0-02 86 0-03 86 0-01 -91 0-01 
5 2 -400 667 08 81 14 87 -05 85 09 
6 3 +500 “714 +15 85 -21 88 12 88 16 
7 4 -571 "150 22 87 29 89 18 -90 -23 
9 5 -556 778 25 83 35 89 21 86 +29 
10 6 0-600 0-786 0-30 0-85 0-38 0-90 0-26 0-88 0-33 
11 7 -636 792 35 86 42 -90 -31 -89 37 
13 8 615 796 36 “84 45 90 32 86 40 
14 9 643 798 39 85 46 -90 +35 87 42 
15 10 +667 799 42 86 48 90 38 88 44 
17 11 0-647 0-799 0-42 0-83 0-50 0-90 0-38 0-86 0-46 
18 12 667 -800 45 84 “51 90 41 87 46 
19 13 684 -800 47 -85 52 -90 43 88 -47 
21 14 667 -800 46 83 52 -90 -43 85 48 
22 15 682 800 48 84 53 90 45 86 48 
23 16 0-696 0-800 0-50 0-85 0-53 0-90 0-47 0-87 0-49 
25 17 -680 +800 -50 -83 54 90 46 85 49 
26 18 692 -800 51 84 -54 -90 -48 86 -49 
27 19 704 -800 53 -84 54 90 “50 86 -50 
29 20 690 -800 52 -83 54 90 49 85 50 
30 21 0-700 0-800 0-54 0-83 0-54 0-90 0-51 0-85 0-50 
31 22 -710 -800 “55 84 55 90 -52 86 50 
33 23 697 -800 -54 -83 55 90 51 84 50 
34 24 706 -800 -55 83 55 90 +52 -85 -50 
35 25 714 -800 -56 84 55 90 -54 85 50 
36 26 0:722 0-800 0:57 0-84 0:55 0-90 0-55 0-86 0-50 
37 27 +730 -800 58 85 55 90 56 86 50 
38 28 737 800 59 85 55 90 57 87 50 
39 29 744 +800 -60 85 55 90 58 87 -50 
40 30 "150 *800 *61 86 55 -90 59 87 50 
Upper boundary 
40 31 0-775 0-800 0-64 0-88 0-55 0-90 0-62 0-89 0-50 
38 30 789 “800 “65 89 55 91 -63 +90 50 
34 27 794 “800 65 90 55 91 62 “91 50 
30 24 +800 “800 64 91 55 91 61 92 50 
26 21 808 -800 “64 92 55 91 61 94 50 
22 18 0-818 0-800 0-03 0-94 0:55 0-92 0-60 0-95 0:50 ; 
18 15 +833 801 62 95 55 94 59 96 50 
14 12 857 806 61 97 56 -96 -57 -98 50 ; 
10 9 900 -833 ‘61 | -995| -57 991 -56 | 99 -51 99 
6 6 1-000 1-000 “61 á 61 — 4 ES 4 — 
oie RE is ( ge S 
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Table 2. Characteristics of the distributions of the maximum likelihood estimator 8, and the 
——— PE 


Probability of exclusion of 0 
(above upper limit or below lower limit) 


| 95 % limits (y =0025) 


90 % limits (y = 0-05) 


Classical Sequential 


0-09230 | 0-05288 


J 22 oooocc 
888 


Table 3. Mean and variance of the distribution of 6, and the probabilities of 
exclusion from the confidence intervals (Example 2) 


11 2 Probability of exclusion of 0 
Distribution of 0 from 95 % confid HE 1 
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Table 4. Boundary points, maximum likelihood estimates of 0, 
and 95 9% confidence limits for 0 (Example 2) 


95 % confidence limits 
n r 0 
Classical Sequential 
Lower boundary 
25 0 0-000 — 0-14 — 0:14 
35 i 029 0-001 15 0-001 16 
45 2 044 005 15 007 17 
55 3 055 01 15 01 17 
65 4 0-062 0-02 0-15 0-02 0-18 
68 5 074 02 16 03 18 
69 6 087 03 18 03 19 
70 7 100 04 +20 04 20 
Upper boundary 
70 8 0-114 0-05 0-21 0-05 0-20 
69 8 116 05 :22 05 20 
68 8 118 05 :22 05 21 
67 8 119 05 :22 05 21 
66 8 121 05 :23 05 21 
65 8 0-123 0-06 0-23 0-05 0-22 
64 8 125 06 :23 05 22 
63 8 127 06 :24 05 23 
62 8 129 06 2⁴ 05 24 
61 8 131 06 24 05 25 
60 8 0:133 0-06 0:25 0-05 0:25 
59 8 136 06 :25 05 26 
58 8 138 06 25 05 26 
57 8 140 06 :36 05 26 
56 8 143 06 :26 05 27 
55 8 0-146 0:06 0:27 0-05 0:27 
54 8 148 07 :27 05 27 
53 8 151 07 :28 05 28 
52 8 154 07 :28 05 28 
51 8 157 07 29 05 28 
50 8 0-160 0-07 0-29 0-05 0:28 
49 8 163 07 30 05 28 
48 8 167 07 30 05 28 
47 8 170 08 31 05 28 
45 7 156 06 30 05 29 
44 7 0-159 0-07 0:30 0-05 0-29 
43 7 163 07 31 05 29 
42 7 167 07 31 05 29 
41 7 171 07 32 06 29 
40 7 175 07 33 | *06 30 
— d * — 
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Table 4 (continued) 
95 % confidence limits 
Classical Sequential 
0-08 0-34 0-06 0:30 
08 34 00 30 
08 “35 06 30 
-06 34 06 31 
07 E *06 33 
0-07 0-36 0-06 0:34 
07 37 -06 35 
07 38 00 36 
08 39 06 37 
08 “40 00 37 
0-08 0-41 0-06 0:38 
09 42 06 38 
07 E 06 39 
-07 42 06 39 
07 44 07 40 
0-08 0-45 0-07 0-42 
08 47 07 44 
09 49 07 46 
09 51 07 47 
10 -54 -07 48 
0-10 0-56 0-07 0-49 
-08 -55 -08 52 
08 -58 *08 55 
09 61 09 58 
10 65 09 61 
011 0-69 0-10 0-66 
2 74 11 70 
14 79 12 76 
16 84 13 -80 
48 90 14 85 
0-15 0-95 0-15 0-94 
19 994 19 992 
29 — 29 = 
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Table 5. Boundary points, maximum likelihood and unbiased estimates of 0, 


and 95%, confidence limits for 0 (Example 3) 
p^ | Estimates of 0 95 % confidence limits 
| 
| A 5 Maximum ^ 7 
likelihood | Unbiased Classical Sequential 
^ 
0 0, 
Lower boundary 
8 0 0-000 0-000 — 0-37 — 0:37 
] 11 1 091 :125 0-002 41 0-003 43 
14 2 143 192 02 43 02 45 
17 3 176 234 04 43 05 46 
20 4 -200 +263 06 44 07 47 
23 5 0-217 0-284 0-07 0:44 0-09 0-48 
26 6 231 +300 09 44 11 48 
29 7 +241 :312 10 44 12 49 
32 8 -250 323 11 43 14 49 
35 9 257 331 12 43 15 49 
38 10 0-263 0-338 0-13 0-43 0:16 0-49 
41 ll 268 344 14 43 17 49 
44 12 273 349 15 43 17 49 
Lower half of 
middle boundary 
44 13 0-295 0-349 0-17 0-45 0-18 0-49 
43 13 302 349 17 46 18 49 
42 13 310 349 18 47 18 49 
41 13 317 350 18 48 19 50 
40 13 325 351 19 49 19 50 
39 13 0-333 0:354 0-19 0-50 0-19 0-51 
38 13 342 357 20 51 20 51 
37 13 -351 361 20 53 20 52 
36 13 361 366 21 54 21 53 
35 13 37¹ 372 21 55 22 54 
34 13 0-382 0:379 0-22 0-56 0-22 0:55 
33 13 :394 387 23 58 23 57 
32 13 406 396 24 59 24 58 
31 13 419 407 24 61 25 60 
30 13 433 418 25 63 26 61 
29 13 0:448 0-432 0-26 0-64 0-26 0-63 
| 28 13 464 447 27 66 27 65 
27 13 481 463 28 68 29 67 
26 13 500 500 30 70 30 70 


— 
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Table 6. Characteristics of the distributions of Ó and 6,, and the probabilities 
of exclusion from the confidence intervals (Example 3) 


Probability of exclusion of 6 


M 
it from 95 % confidence interval 

0 
0-05 | 0-0368 0.00298 | 0-00535 
10 0744 00531 00936 
15 1131 00709 01184 
20 00904 01303 
30 01496 01263 
40 01399 01119 
s 01077 01066 


[ 16 ] 


A STOCHASTIC MODEL FOR STUDYING THE PROPERTIES 
OF CERTAIN BIOLOGICAL SYSTEMS BY 
NUMERICAL METHODS 


By P. H. LESLIE 


Bureau of Animal Population, Department of Zoological 
Field Studies, Oxford 


CONTENTS 
PAGE 
. Introduction 16 
Stochastic model 16 
Deterministic models 17 
'The varieties of stochastic models 20 


Models in which the birth-rate remains constant 21 
Models in which the death-rate remains constant 23 
Some numerical results for a logistic process 25 
The chance of extinction in a logistic population 28 


* fe 


1. INTRODUCTION 


Although there is little difficulty in formulating stochastic models for two interacting species 
of living organisms, the intractability of the resulting equations from the mathematical 
point of view is a very serious obstacle to progress (Chin Long Chiang, 1954; Bartlett, 1957). 
In order to study the qualitative properties of some biological system such as that of a 
predator and prey, or that of two competing species, an approach to the problem by way 
of a set of Monte Carlo experiments may be, for the moment, more rewarding, at least until 
some of the difficulties in handling the full theoretical equations have been resolved. The 
following model is very easily adapted for use numerically on an ordinary hand machine, 
or preferably with the help of an electronic computer. It is developed here for the case of a 
single species living alone in a limited environment, namely, a logistic process; for a system 
of two competing species; and also for the predator-prey type of interaction. 


2. STOCHASTIC MODEL 


Suppose that a population of some species, which may be living alone, or interacting with 
some other species in a limited environment, consists of M individuals at time t, and that the 
expected change in numbers during the discrete interval of time t to t+ can be defined in 
terms of some suitable deterministic model. If, according to this model, we expect B, 
births and D, deaths to occur during the interval, then evidently 


EN) = N+ B,- D. (2:1) 
If we define EN. I) / N= A, = e, ; (2-2) 
we have from (2-1) A = 14 Fl d, (2˙3) 


where /, = B/N, is the birth-rate, and 6, = D,/N, the death-rate, per unit of time, expressed 
in terms of the N, individuals alive at the beginning of the interval, 
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We may define also an expected birth · rate b by 


1 
u -H [ eva, 
” 
from which A= 4 (e"—1), 
t 
or b, = Br f(A). 
Similarly an expected death-rate d, may be defined by 
D,= anf erdr, 
0 
or d, = =I) 
so that, using (2-3) d. = fe 


Thus, working in discrete time intervals, we can regard the expected change in numbers 
from N, to Ni as taking place through the operation of a birth rate ij and a death-rate di 
which remain constant throughout the interval. 

If we consider a simple ‘birth’ and ‘death’ process, for which the constant rates are 
expressed in terms of some convenient unit of time, then according to standard theory 
(Kendall, 1949), the mean population size at time t+ 1, given N, individuals at time f, is 


E(N) = eh dd N, 
and the variance var (Nui) = 24 ere 1je^c-40 (d (2:4) 
= 
= 25, NM (b, di. 


In the case of a species living alone both b, and d, will be some functions of M, and if the 
form of these functions is specified, it should be possible to adapt this model for use in a 
step-by-step Monte Carlo realization of the process. Thus, in order to simplify matters, we 
might assume as an approximation that the distribution of N; is normal with x and g? 
defined by (2-4), subject to the condition that all negative values of Nui are attributed to 
N, = 0. (It should be noted, however, that this approximation may not be too good in the 
region of small N.) Then, given N, we could calculate Ni with the help of a table of random 
normal deviates, and the process can be continued with the resulting value of N,,,. The 
same type of model is also applicable to two interacting species S, and S, in which case the 
respective b,(t), ba(t), dict) and dealt) will be some appropriate functions of the N, (t) and M(t) 


individuals alive in each population at time f. 


3. DETERMINISTIC MODELS 

In order to develop this method of approaching the problem of two interacting species, the 
deterministic models which give the expected balance of births and deaths during the in- 
terval f to t+ 1 must be expressed in the form 

M(t+1) = AMO, MO} MO, 

N,(t+ 1) = RAN), NO) NK. 
The simplest equations for the three cases considered here; namely, the logistic model for 
a single species: the case of two competing species: and finally the predator-prey type of 
interaction, are as follows. 
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18 Stochastic model for studying biological systems by numerical methods 


(a) Single species 
‘The required expression is obtained very easily (Leslie, 1957) from the logistic differenti 
equation 
z =(r—aN)N, 


where a is a positive constant, and r is the difference between a birth-rate b and a death 

rate d, and represents the intrinsic rate of increase of the species which would only b 

approached if no limitations either of food or space were placed upon the increase in numbers 
The familiar integral of (3-1) is 

3 

D Pratt 

where K is the upper asymptote in numbers and the constant C defines the initial state of 

the system. If we write | 


(K — r[a), (3- 


Ame en, 
we have from (3-2) after a little rearrangement, 
AN, i 
Ma Trax (3:3) 
where the constant a = (A-1)/K. 


(b) Two competing species 

Suppose that if two species S, and S, were each living alone in a limited environment, 
they would increase in numbers according to a logistic equation (3-3). Then, when both are 
competing together in the same environment, we may write 


E AND) 
NU = re NG en a 
TF) l 
MEAD = ECL) ey Ny 


where A, and a, are the logistic parameters for the species S, when it is living alone, and 
similarly A, and æ, those for the species S; while the positive constants y, and y, express 
the magnitude of the effect which each species has on the rate of increase of the other. 

Working in diserete time intervals, the system of equations (3-4) is closely related to 
well-known Lotka- Volterra differential equations for two competing species 


dN, 
de = (n—-a, N, b, N) N, 


dN, 
a = (ra— N- b, N,) Ny. 


For, to take the first member of (3-4) as an example, if from (3-3) we write 
a = (4—1)/K,, 
and put Vi = ko, 


then N,(t+1) = AM (t) 
0 7 15 TK QUI ERGO 


P. H. Lesure 19 


The value of the parameter A, depends on the unit of time which has been adopted. 
Suppose that for an interval of time À we have 


A (h) = At, 
N(t--h)-N(t [A] 1 1 — (NU) + EN,(O}/K 
th mar A Bln! EE ee si -— 
e h (Er no i F(AE- 1) cx 
and as h-> 0, this may be replaced by 
aN, y N, * 
D = (log, A000 fı Aer No], 


which is of the same form as the first member of (3:5), if in the latter we put log, A, = fy 
a, = r,/ K, and the ratio b,/a, = k. The second member of (3-4) is related in the same way 
to the second member of (3-5). 

The properties of the system (3:4) are similar to those of (3-5). Thus, the latter will have 
a stationary state when 
Nx oe ar- br, 
! aa, bb," 
& pa 


N, = 2 
474, -7bb, 


and if there is a solution to these equations, N, = L4, N = La, with both Li and Z4» 0, 
then, as is well known, this stationary state will be stable if 2,0, » b,b,, and unstable if 
41d « b, bs. Similarly, the system (3-4) will have a stationary state when the denominators 
in the equations are equal to A, and A, respectively, or when 


2260871) 730471) 


N, p 
1 — 1 Ys (3:6) 
y, = 00870-50170) 
5 04,03 — 1 Ys 


Given a solution to these equations with Li, La » 0, this state will be stable when a, 2$ > 3 ys 
and unstable when 2,2, < y, ys. By a suitable choice of the parameters in (3-4) we thus can 
construct a numerical system with either a stable or an unstable stationary state. Similarly, 
for the other possibilities which arise in the case of two competing species, when in (3-6) 
one of the solutions is positive and the other negative, leading to the consequence that one 
of the species persists and the other disappears from the system. 


(c) The predator-prey relationship 
If S, is a species of prey and S: the predator, then the familiar Lotka-Volterra differential 
equations for the system are 


a -(r- NON, 
(97) 
aM = (Le f) 


and for a discussion of the properties of a stochastic model based on this classical system 
reference may be made to Bartlett (1957). From the biological point of view, however, these 
equations are not entirely satisfactory as a description of the interaction. In the first place 
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no allowance is made in them for any intra-specific competition, and although it is easy 
remedy this by inserting terms in Nj and N$ in the respective equations, a more serio 
deficiency is that no upper limit to the relative rate of increase of the predator is defined 
the second member of (3-7). An alternative set of equations (Leslie, 1948, § 6) is 


dN, 
- = (r, a, N, — ba Na) NM, 


It will be noted that now, if the prey becomes very numerous and N, oo, Ny 'dN,/ dt — ry 
the intrinsicrateof increase of the predator; while conversely, when N, > 0, N; ?4.N,/ dt ^ — o, 
corresponding to the disappearance of the predator in the absence of any prey. 

Working in discrete time intervals, the set of equations analogous to (3:8) is 


PLC ERU 
MO 7 CrNi) ad 
TA? A ALD 1 
N LTA x 


where 24, d and y, are positive constants, and log, A, = r; and log, A, = r. This system wil 
have a stationary state when 
* as(, — 1) 
VIA IDT 
y- -D C= 
VIA I) T4 
and an appoach to this stable state will in general be made by series of damped oscillations 
(Leslie, 1948). 


(3-10) 


4, THE VARIETIES OF STOCHASTIC MODELS 

These deterministic models (3-3), (3-4) and (3-9), which give for each species the expected 
numbers N,,, at time t+ 1, given Mit) and Nat) individuals at time f, express merely the 
balance between the birth-rate and the death-rate of the particular species during the 
interval. Thus, in each case we have an expression of the form 


Aa 
galt) 
where q,(t) is some function of the numbers in each population at time ¢. For each species, 
therefore, we have from (2-3) for the interval t to t+ if 

Aalt) = Aja) = 1 ＋ Galt) dat) (a= 1,2). (42) 


But, in order to calculate the variance of the distribution from (2:4), it is necessary to specify 


510 the birth-rate and the death-rate during the interval as some functions of Nt) and 
Alt). 


The difficulty here is best seen by considering the logistic differential equation, 


Nl = Ne (a= 1,2), (41) 


daN 
a (r—aN)N. (4:3) 
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This may also be written dN 
i =o GM-A, 


where the birth-rate function W(N) = b-a, N, 
and the death-rate function HN) =d+a, N 


with b— d = r and a, +a, = a in (4-3). There are theoretically, therefore, a large number of 
possible stochastic models for a logistic process which have the same deterministic equivalent 
(Kendall, 1949; Bartlett, 1957). If we fix on some values of b and d, taking as usual b > d > 0, 
these possibilities lie between a, = 0, a4» 0, and a; > 0, a, = 0 in (4-4); in other words, the 
extreme cases for the logistic are when 


(1) w(N)*- b, constant, 
(N) =d+aN, 
and (2) WN) DLAN (0«N Sha), 
=0 (NY ha), 
$(N) = d, constant. 


It is to be noted in this last case that since a negative birth-rate is meaningless, this 
function is defined only for values of N lying between zero and b/a. In the deterministic 
model, however, no limitations are placed upon the initial value N, with which the system 
is started and we have, therefore, to define y(N) = for N > bja. 

It is evident that the same arguments will apply in the case of the differential equations 
(3-5) and (3-8) for two competing species and for the predator-prey relationship, respectively. 
An almost innumerable variety of different stochastic models for these systems can be 
imagined. Nevertheless, from one point of view, these possibilities are bounded, as it were, 
by the same two extreme cases, namely, when either the birth-rate or the death-rate of each 
species remains constant. We shall consider, therefore, the numerical development of this 
model in terms of these two limits. 


(4-4) 


5. MODELS IN WHICH THE BIRTH-RATE REMAINS CONSTANT 
We have from (4-1) and (4:2), dropping the suffix a and confining our attention to one of 


the species A 

Na = g= XN, (5:1) 
where the constant A= e, (5:2) 
and hence, since the birth-rate is assumed to remain constant, 

log, A, = b—d, = re (5:3) 
We may therefore write (2-4) in the form 

E( Ni) HE A. N. (54) 

var (Nui) = G (Nai), 

2b 

where Lim 61 * N | p m di 
= 9b, (r, = 0). Ge AND PSyc, PIS 
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In designing a Monte Carlo experiment based on this model it is necessary in 
first place to decide on some suitable numerical values of A and b in (5:2). One 
of proceeding would then be to tabulate the function $ in (5-4) over a range of 
sible values of A, (or its natural logarithm 7j. However, if we are free to choose 
arbitrary values of the parameters A and 6, this expression for the variance can b 
greatly simplified. 

A particular value of A = i is compatible with a range of possible values of b and. 
(b» d); but it appears that for certain combinations of A and the ratio b/d = k» 1, 
function ¢ remains comparatively stable over a wide range of positive and negative val 
of r, Thus, the results of a rough preliminary calculation suggested that for 2-0 < A<2 
there would be a value of k for which g remained approximately constant over a range 
the argument r >r,> —r. For instance, the following are the values of ¢ = f(r,) for the sta 
combinations of A and k. 


A=20, k=3-2 A= 2-25, k= 5-0 Az25, k= 9-0 ; 
Tt ? 
0-811 1-87 
0-611 1-95 
0-411 2-00 
0-211 2-02 
0-111 2-02 
0 2-03 
—0111 2-02 
-0-211 2-02 
— 0-411 2-00 
— 0-611 1:97 
—0-811 1-95 


It will be seen that in all three cases we have ¢ = 2-0 over a relatively wide range of possib! 
values of r, and we can infer that the same approximation will hold for any A, 2:0<A< 2:5; 
and some value of k lying between 3-2 and 9-0. Since no very high degree of numeri 
accuracy will be required in carrying out the computations by means of (5-4), and assumi 
as an approximation that we are dealing with a normal distribution, it is evident that 
great deal of time will be saved at each step in the calculations by adopting values of A 
within this Tange, together with the appropriate value of k = b/d, and using the approxima- 
"en [] «m the expression for the variance. 

en, to summarize this model in which it is assumed that the birth-rate of each speci 
remains constant, we calculate the expected numbers at time t+ 1 by means of 9 1 


U in- M (4 — 1,2 (5-6) 


and assume that N (t+ 1) is distributed normally with variance 


var[N,(t+1)] = 2E[N (4-1)], (57 
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provided we are able to choose values of A, within the prescribed range; and where, for the 
three systems considered here, we have 


Single species (logistic): qit) = 1 aN(t). 
Two competing species, S, and &,: gilt) Lea NOS Y, Nt, 
quit) Ia NV) y, Ny). (5-8) 
Predator-prey (S, prey: S, predator): gilt) = Va, N(0) € Y ND, 
Nit) 


qalt) — 1 NI 


In each of these three cases the values of the constants a, a, 7, and Yẹ given A, and Àp 
are chosen so as to give some convenient stationary state of the particular system, according 
to the equations (3:3), (3:6) and (3°10). 

This approximation for var[N,(t+ 1)] will be found to hold over most of the ranges of 
q, (t) which will occur in practice for one of these hypothetical populations, Thus the assump- 
tion that ¢ = 2 for r>r,> —r in (5:5) is equivalent to saying that for a given value of A, 
within the prescribed range, 1 « q,(t) € A. For instance, in the case of the logistic, having 
chosen a value of æ so as to give a stationary state K = (A—1)/a, this means that the 
approximation will hold for all values of N lying between zero and N = (A-- 1) K, a range 
which is ample for all practical purposes. It is more difficult in the cases of two competing 
species and of the predator-prey relationship to define these limits concisely in terms of 
Mit) and N,(t); but, for either of these systems, given any reasonable values of N,(0) and 
N,(0) in relation to the assumed stationary state of the particular system, experience has 
shown that in the development of the resulting process, q,(f) remains less, and usually much 
less, than A2 in each case.“ 


6. MODELS IN WHICH THE DEATH-RATE REMAINS CONSTANT 
In this case the development of the model in numerical terms is somewhat more com- 
plicated. Given a value A0 


then for the interval of time t to t+ 1, 
log, A = = bid, 
where b is now a function of N, (£) and N,(t), and d for the particular species remains constant. 
Because a negative birth-rate is meaningless, we therefore have to define 
=À G E 


since b, = 0 when q, = e and 
* e (>e). 


Hence, corresponding to (5-4) and (5:5), we have 
E(N) = AN, 


var (Na) = G E(N a) 
* Tn the case of two competing species this approximation should hold at any point on the (No Na) 
plane, which lies below and to the left of the boundaries formed by the intersecting straight lines 
a= NM, a= M. 
where q, and q, are defined in (5-8). 
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where from (2-4) 
9 = (1+2djr) (n0, r2n2 d, 
= 2d (r, z 0), 
1-1 ( -d). 


It is evident that in this case there will be no constant value of g for some combina: 
of A and b/d = k. In any numerical experiment using this type of model, however, 
value of g can always be tabulated for given A and d between the limits À > A e. 
this model in which the death-rate remains constant is likely to be used as a contrast t 
that in which the birth-rate remains constant, it is of interest, therefore, to tabulate ¢’ 
the same values of A and & as were illustrated in the previous section. Thus we have 
following values of d for the given A and b/d = k. 


Taking these values of A and d, we have the following tables of ¢’ = f(A,), ending in each 
with A, = e, 


A=2.0, k= 3-2 | A=2-25, k= 5-0 A= 2.5, k= 9-0 
A. p À; g AL g 

20 1-9092 2-25 1-8752 2-5 1-8749 
18 1-6577 2-05 1-6433 2-3 1:6574 
16 1:4045 1-85 1-4104 2-1 1-4395 
1-4 1-1492 1-65 1-1765 1-9 12211 
12 0-8913 1-45 0-9412 1-7 1-0021 
11 0-7612 1-25 0-7044 15 0-7824 
kt 0-6302 1-05 0-4657 1:3 0:5619 
R 0-4981 1-00 0-4056 ll 0-3403 

0-3648 0-95 0-3453 1-0 0-2290 
= — 0-85 0-2243 0-9 0:1173 
0-7297 0.2703 > 0:8165 0-1835 0:8918 0:1082 


oe all three cases ¢' is very nearly a linear function of À, In fact, working to two decimal 
places which should be sufficient for all ordinary purposes, a very good approximation to 


ġ' is given by the following straight lines: 
A=2:00: 4’ = —0-664- 1-29A,, 
A= 2.25: 9 =—0-77+1-18A, (6-1) 
A = 2˙50: 5 —0-87+41-10A, 
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Thus, in a numerical realization of this model, we might adopt one or other of these values 
of A and d, and calculate the expected numbers at time ! + 1 by means of 


EN, (t+ 1)) = 200 NAO = A. N) A. > Alt) > 7*9) 
= ets NA) (galt) > e) 
(a = 1,2), (6-2) 
and var[N (t 1)] = fA] EN 1] G. A,» 679) 


m(l-e*^)E(N( Ge €), 


where the g(t) for the three types of system are the same as those given in the previous 
section (equations (5:8)), and the functions $' = f[A,(!)], in the expression for the variance, 
for three values of A are given by (6-1). 

Some numerical results obtained by using these two types of model are given in the next 
section for a logistic process, while the results for a system of two competing species will 
be published in a later paper. 


7. SOME NUMERICAL RESULTS FOR A LOGISTIC PROCESS 


'The question of the relationship between the possible types of stochastic model and their 
deterministic equivalent arises in an analysis which has been given (Leslie, 1957) of some 
replicated experiments carried out by Gause (1934) with populations of the Protozoa, 
Paramecium aurelia and P. caudatum. The size of these populations, when each species 
was living alone, became fairly large, the number of individuals in each replicate increasing 
from 20 initially to around 2000-6000 when they were in the region of their stationary state. 
It was assumed in the analysis that the changes in the mean values of these processes could 
be described adequately in terms of a deterministic model, and it was shown that in the 
case of both species living alone a logistic equation gave a satisfactory fit to the mean values, 
as judged by the degree of variation observed between the replicates. But it was not at all 
clear at the time the analysis was made, what the relation would be between the parameters 
of a logistic fitted empirically in this way and the true parameters of the process, assuming 
this was a random logistic. Moreover, it was completely unknown whether the decrease in 
the relative rate of increase in numbers of these populations, as they approached the sta- 
tionary state, was due to a decrease in the rate of division of the individuals, i.e. to a reduc- 
tion in the birth-rate, or to an increase in the death-rate, or to some combination of these 
factors. A set of experiments, therefore, was carried out with the two extreme types of 
logistic model, in order to see how the mean values of these random processes behaved in 
relation to the deterministic model as the value of the upper asymptote K increased in 
magnitude. 


In the deterministic model AN, 
pa EN. = Ta, (7-1) 


the value of A = 2:0 was adopted, and in turn a was taken as 0-01 and 0-0005, so that from 
(3:3) the upper asymptotes in numbers were K = 100 and 2000, respectively. The initial 
numbers for the two populations were M = 15 and 300, giving the same relative difference 
between M and K in each case. Each population was then assumed to be subject, first to 
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a constant birth-rate and variable death-rate (B.R.c. model), and secondly to a variable 
birth-rate and constant death-rate (bx. . model). There were ten replicates of each type of 
population, and starting at an origin of time, the processes were calculated up to ! = 10. 
Since the purpose of these experiments was to compare the types of model and the effect 
of inereasing the size of the populations, the same set of random normal deviates was used 
throughout. Thus, a block of 10 x 10 deviates in units of ꝙ was taken quite arbitrarily from 
a convenient table given by Deming (1944, Appendix), and it was assumed that a particular 
replicate, no. 1 for instance, was subject to the same sequence of deviates in each set of 
experiments, Any differences between the results, therefore, can be attributed either to 
the size of the populations, or to the type of model which was used. 


Table 1. The mean values of the ten replicates 
Values of N, with range in parenthesis. 


K - 100 K = 2000 
B. R. O. D. R. C. B. R. C. D. R. O. 
0 | 150 15-0 300-0 300-0 
1 | 275(15- 40) | 27-2 (17- 38) 529-0 (475- 585) 528-2 (480— 578) 
2 | 45:9(24- 77) | 449 (27- 70) 849-9 (758- 985) 846-4 (766— 962) 
3 63.6 (34 97) | 62-3(39- 88) | 1200-2 (1061-1352) | 1194-8 (1083-1323) 
4 | 69-8 (43-119) | 71-£(50-101) | 1473-7 (1355-1698) | 1477-8 (1379-1645) 
5 | 857(49-108) 858.9 (59 99) | 1718-8 (1556-1832) | 1713-7 (1596-1802) 
6 83.8 (83-113) 87-5 (65-106) | 1815-2 (1679-1935) | 1825-8 (1735-1901) 
7 | 853(60-112) | 90-3(77-105) | 1881-8 (1770-1996) | 1896-7 (1836-1962) 
8 | 919(64-126) | 95-9(80-113) | 1948-3 (1824-2082) | 1952-3 (1883-2029) 
9 | 86:8(70- 98) | 93-0(83- 99) | 1936-4 (1858-1993) | 1952-6 (1912-1985) 
10 | 91-0 (67-120) 98.4 (83-113) 1961-2 (1870-2094) | 1972-4 (1922-2048) 


B.R.C. = constant birth-rate model. D. R. O. = constant death-rate model. 

The calculations were carried out on an ordinary hand-machine, and no high degree of 
numerical accuracy was attempted. Thus, E(N,,,) was calculated to the nearest integer, 
from which c? and o was obtained. The product of c and the random normal deviate +A 
was taken to the nearest integer and added to, or subtracted from E( Ni), according to the 
sign of the deviate which had been drawn. 

The results are presented in Table 1, where the mean values of the ten replicates in each 
set are given up tot = 10, together with the observed range of the individual N, in paren- 
theses. It will be seen that in the later stages of the growth in numbers the mean values of 
these processes are tending to settle down to a level which is less than the deterministic 
asymptote. The relative difference between these levels and the upper asymptote is greater 
when K = 100 than when K = 2000; and in each case the model with the death-rate con- 
stant ‘approaches nearer to K than the constant birth-rate model. A difference in this 
direction between the mean values of a stochastic logistic process around the stationary state 


and the asymptote of the deterministic model is, h i 
fails, E. is, however, to be expected theoretically 


— aig E) o it 
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A logistic curve was then fitted to each series of mean numbers in Table 1, by the same 
method as that used in the original analysis of Gause's data (Leslie, 1957, $4). As a result 
the following estimates of the parameters A, a and K = (A- 1)/a in (7-1) were obtained. 


P — 
Type of model | a ! a ! K | 

— — — | 

u. u. C. 2.349 0-01534 | 57-9 i 
v. n. O. 2477 0.01257 93-6 
(True values) (2-000) (001000) (100-0) i 
n. u. C. 2-052 0-0005362 1962 i 
b. n. C. 2-039 0-0005256 1977 
(True values) (2-000) (0-0005000) (2000) | 


It will be seen that in the first pair of experiments, when K = 100, the estimates of the 
parameters À and æ differ quite appreciably from the true values. These differences, how- 
ever, are less in the case of the constant death-rate model, which is presumably due to the 
smaller variance of this model as N approaches K. When K = 2000, the parameters of the 
empirical logistics approach much closer to the true values, and the differences between the 
two types of model are very much less. 

These logistics appear to give an excellent fit to the observed series of means in Table 1, 
For instance, to take as examples the two more extreme estimates of A and a in the table 
above, we have the following expected values compared with those observed, given the 
initial values of the processes in each case. 


n. R. O. model: n. n. O. model: 
À = 2.349, a = 0.01534, N,— 150 | À= 2-052, a = 00005362, N, = 300-0 


1 
2 
3 
4 
5 
6 
7 
8 
9 
0 


= 


The main conclusions which one would come to as à result of these experiments are that 
the mean values of arandom logistic can be fitted by an ordinary logistic curve, and that the 
estimates of the parameters for these empirical logistics gradually approach the true values 
of the process as numbers increase in itude. For large populations we might conclude 
that, to a fairly close approximation, the deterministic model is for all practical purposes 
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the same as the mean of the stochastic model. Moreover, it appears that with populations 
of the size observed by Gause, the differences between the possible types of stochastic model 
in the case of a logistic process are not of any very great importance. 


8. THE CHANCE OF EXTINCTION IN A LOGISTIC POPULATION 

Recently, Bartlett (1957) has discussed the question of the chance of extinction in a 
random logistic process. He points out that although from the theory of finite Markov 
chains the ultimate state of such a system is N = 0, nevertheless, if we consider the variance 
of a small deviation from the upper asymptote (or, more strictly speaking, a quantity which 
he defines as y/a, and which is equal to twice the variance of the deviation), then provided 
this quantity is small, the chance of extinction may be neglected for any given time interval. 
He suggests that under such conditions the population will continue to show fluctuations 
with this variance. Since none of the processes calculated in the previous section showed any 
tendency to drift towards the absorbing barrier N = 0, and appeared in their later stages to 
be approaching some steady state, it is therefore of interest to relate the numerical results 
obtained by means of the models used here with the quantities expected from the more 
formal theoretical development given by Bartlett. 

Bartlett considers the asymptotic situation when N ~ K; or, more precisely, when there 
is a deviation u from the upper asymptote K, defined by 


u-(N—K)K. (8:1) 
He then shows that in the stochastic model we have for small u 
var (u) = iy|a', (8:2) 


where a prime has been attached to his symbol & in order to distinguish it from the « used 
here in the case of a logistic process. If we write the birth-rate and death-rate functions, 
respectively, as (cf. equations (4-4)) 


V ) ba N, A(N) dN, (83) 
with b—d =r, a, +a =a, and K = rja in the deterministic model 

aN 

e (r—aN)N, 


then we have in (8-2), according to the definitions given by Bartlett 
y= (b+d)/K +a,—a, 
* = bd. 


Thus, if we take the extreme cases of either the birth-rate function or the death 
tion remaining constant in the stochastic model (i 
respectively), we should have for the 


-rate funo- 
e. when in (8-3) a, = 0, and a, = 0, 


B.R.C. model, var ( = Eb ch 
: (8:4) 


D. R. O. model, var ( = Kb d) 
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It is not easy to say in the case of developing systems, such as those illustrated in the 
previous section, exactly from what point onwards in time these processes can be regarded 
as approximating to their steady state. But, supposing we assume that at the last three 
calculated ‘censuses’, at f = 8, 9 and 10, these numerical realizations were on the average 
in the neighbourhood of this state. Then, from the figures for the ten individual replicates 
in each case, we can estimate by an ordinary analysis of variance the residual var (V). 
which in these examples will be based on 18d.f., after eliminating the sums of squares for 
between times’ and between replicates’. Since from (8-1) we have var (u) = (1/K*) var (N), 
we can thus obtain estimates of this variance from the observed numbers, which can be 
compared with those expected from (8-4), given the numerical values of the parameters 
adopted in these processes, viz. b = 1-0083, d = 0-3151 (A = 2-0). The results of the caloula- 
tions were as follows: 


var (u) | Ratio (n.n.c.)/(D.n.c.) 
K Type of - — 
model | 
Estimated Expected Estimated Expected | 
100 B.R.C. 0-01781 0-01455 | 
b. n. o. 0-005852 2 oe $2» 
2000 v. n. o. 0.00 08951 eee -T $320 
D.R.C. 0-0002825 0-0002273 


It is evident that these variances are of much the same order as those expected. All four 
are somewhat greater than expectation; but these estimates of var (u) cannot be entirely 
independent, since the same block of random normal deviates wasusedin calculating each set 
of processes. If we were to test any single one by means of a x? test, the agreement with 
expectation would be regarded as satisfactory (and even the total y? is not excessive). An 
important point is that it follows from (8-4) that the ratio of var (u) for the two types of 
model should be equal to b/d, the ratio between the assumed values of the constant birth- 
rate and the constant death-rate. It will be seen that the values of this ratio determined 
from the estimated var (u) are very close to the true value of 3:20. Considering the approxi- 
mations which have been made, not only in carrying out the actual computations, but 
also in developing these types of stochastic model, particularly in regard to the estimates 
of var (Nai) in 88 5 and 6, this surprisingly good agreement with expectation is most 
encouraging. 

It has been stated that in the populations with the smaller numbers and greatest variance 
(x. n. O. model: K = 100) there was no evidence of any tendency for the number of individuals 
to approach zero, once the replicates had surmounted the early stages of their growth in 
numbers. The steady state for this stochastic model appeared to be N ~ 90, and the standard 
deviation of random fluctuations about this state was estimated from the observed numbers 
to be 4/178 = 13:3. That is to say, in terms of a normal distribution, we would have for 
any given time interval P(N « 56) ~ 5.1072, P(N 46) 5.104 and P(N — 0) « 5.10710, 
licate in the region of N = 90 will fall below N = 50 is small, 


Clearly, the chance that a rep 4 N= 
while the chance of extinction can obviously be neglected for any given time interval. It 
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seems reasonable to conclude, therefore, that in this particular system the chance of | 
replicate becoming zero, once it is in the region of its stationary state, is negligibly small; 
Thus, we might expect that under certain conditions, even quite small populations of 
single species, of the order of 100 individuals, could continue to fluctuate about some 
stato for comparatively lengthy periods of time without showing any tendency to 
extinct 


The most striking evidence that relatively «mall populations of a single species can 
almost indefinitely under stable environmental conditions is shown by the results of Park’ 
experiments with the flour beetles, T'ribolium confusum and T. castaneum (Park, 1954), 
Replicated populations of these two species were observed when each was living alone in 


numbers for this species, under these conditions, oscillated in a fairly regular fashion around 
the overall average of about 80 individuals, falli g as low as 35 on day 120 and 51 on day 
450 (Park, 1954; p. 188, and Appendix, Table 2). These relatively long-term oscillations 
about the equilibrium level may have been due, in part at least, to the changing age structure 
of these populations, and such oscillations are likely to increase the variance about this 
level, and hence to increase the likelihood of extinction. In all the remaining cases, however, 
the populations of both species persisted (in the absence of infection and inadvertence), 
some of the replicates for each of his physical treatments being observed for a total period 
of 1860 days, or just over 5 years. Prof. Park has very kindly sent me copies of some graphs 
and also the detailed data for all the individual replicates of both species observed at 29° C. 
and 70 % K. k. The mean total number of individuals in these populations when they were 


physical conditions, is 55-6 days (Leslie & Park, 1949), so that a period of 1860 days for 
this species represents a total turn-over of about 33-34 generations, Taking the conventional 
mean length of a generation for man as being roughly 30 years, some of these populations 
were observed, therefore, over a period of time which would be equivalent to about 1000 
years in terms of human experience. As Park points out (1954, p. 188) in regard to these 
experimental populations: .. it seems reasonable to assume that they would persist 
indefinitely under obtaining procedures of husbandry unless new, deleterious influences, 
whether ecological or genetic, developed spontaneously or were introduced.’ 
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I am indebted to Mr D. G. Kendall for some most valuable criticisms and suggestions 
which he made after seeing an early draft of part of this paper. I should also like to thank 


Prof. T. Park for the generous way he has replied to my many inquiries by sending me the 
detailed data for some of his experiments, and Prof. M. S. Bartlett for allowing me to see 


the proofs of his recent paper before publication. 
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ON THE DERIVATION AND APPLICABILITY 
OF NEYMAN'S TYPE A DISTRIBUTION 


By J. G. SKELLAM 
The Nature Conservancy, London 


l. Using a model suggested by ecological processes, Neyman (1939) derived a class of 
‘contagious’ distributions of great interest to biologists. The two-parameter type A dis- 
tribution in particular has found many useful applications, but, in view of the ecologist’s 
prime interest in elucidating the processes of nature, it is important that no misunder- 
standing exists as to the kind of ecological or spatial picture which is implied not only in 
the original derivation but by subsequent workers (Thompson, 1954). 

2. It was supposed that a number of ‘centres’ were distributed at random in a large field 
F, that each centre gave rise to a number of offspring n (the actual number being a random 
variate with probability function p(n)), and that the offspring from any one centre were 
distributed in the space around that centre independently of one another. The actual law 
(f) governing the distribution of offspring about their centre of origin was not explicitly 
stated, though it was clear that it had the same basic character whatever the position of the 
particular centre of origin. The primary aim was to deduce the probability distribution of 
X, the total number of offspring occurring in a randomly chosen plot or quadrat (here 
called Q), taken as being unit area. 

Neyman denoted the probability that an offspring arising from a centre (g, ) falls into 
Qas 

PED = || EM 0 


where the integration extends over all points (z, y) in Q. 
By regarding f(z — £, y — 7) as zero for points (x, y) sufficiently removed from (£, 7), those 
K ba of contributing offspring to Q were restricted to a region (here called .9/) 
3. After obtaining the general result in terms of the unspecified functions p and f, the 
nature of which may differ in different ecological situations, Neyman considers several 
particular cases. The simplest procedure, leadi g to the so-called type A distribution, was 
first to regard p(n) as the Poisson function and secondly to set 


Pens [7 for (S, ) in &, (2) 


0 for (&,7) outside . 
This representation of Péé, 
approximation, and it is the 


integer, and that the distribution of offspring in space should consist of A equally sized 
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probability maases so spaced that no two are enclosed at the same time by any figure equal 
(in shape and area) to Q and with the same orientation. 

The nature of the general result is connected with the self-evident fact that, if Au, v) 
(j = 1,2, ...,n) are a set of cumulative frequency functions such that 
1 for (C. y) in v, 
cane] = l for (E. ) on boundary of (3) 
O for (E, y) outside , 


then F(u,v) = P Hlu, v) 


li for (E. ) in L. 
satisfies ff dF(x—£,y—7) = Á for (e, y) on boundary of L 
` 0 for (E, y) outside L 
provided that the , do not overlap, though their boundaries may coincide. 
5. Without loss of generality, the problem is most easily considered in its one-dimensional 
form, viz. A- for ; in (, H), " 


1 
Í NL * i for S outside (a, £). 


Clearly F(1—£)—F(-£) = Péé), (5) 
where the cumulative function F(u) is by necessity a non-decreasing function of u. Í 
Setting £ in succession equal to * d, & J- I, 4-0-2, . in (5), where à is an arbitrarily 
small positive quantity, we obtain 
F(-a+6) = F(l—a+6) =... = F(0), 
so that F(u)=1 forallu» &. (6) 
Similarly, F(u) =0 forallu«1—f. (7) 


The simplest particular solution arises when the points —2 and 1— 5 coincide, that is 
when f/—a = 1, and the whole probability mass therefore is located at this point. Then by 
setting S = a+} = f — 3, we obtain from (4) and (5), F(À—a) — F(— f) = A, whence 


from (6) and (7), A — 1. È . 
Consider now the possibility that the length of the interval (a, ) is not exactly an integer. 


Then f—a = I +g = I- 1, where I is the integral part of f—a, 0<q<1 and jc ir «1. 
We can now choose sets of points £ spaced one unit apart in (æ, £) in two main ways: 

(i) , lta =v Ira | (—-f—3) 

(ii) at}, IT Ir, ., I-l+a+hr (=f-}). 
ues of g are substituted in (5) and the results for each set added, we obtain 


If the two sets of val 
in t 
in the two cases Fü-a-i)-FC- P) - (L1) 43, (8) 
FI a- Ir) F- 9) 14-1. (9) 
are faced with the contradiction I +1 = I, except for the 


From (6) and (7) it follows that we ith t 
possible limiting case where A and I tend to infinity together. 


3 
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There is no such contradiction, however, when the length of the interval (a, 7) is exacti 
an integer provided that P(£) = 14^! at the end-points æ and H. It then follows th 
A = I = J-a, and the cumulative function F(u) is a step function with J equal steps 
spaced in succession at unit distance apart. 

6. The case just considered may be regarded as one where the interval (a, A) can be 
resolved into I contiguous pieces of unit length each, and the method of argument is readily 
extended to the more general situation where instead of a single interval there are several 
non-overlapping intervals, the lengths of which are integral multiples of the basic unit (QJ, 
The nature of the two-dimensional result then becomes apparent, 


me 
me 


Me 


Fig. 1 


Fig. 1 illustrates a typical solution of the two-dimensional integral equation where Q 
for the purpose of a diagram is arbitrarily shown as a lozenge-shaped region. The manner of 
distribution of the offspring about their centre O is represented by three distinct and eque 
probability masses M,, M, As. The region L, is shaded. If O lies in A, then M and only 
M; lies in Q, the shape and pattern of the L, being determined by the shape of Q and the 
pattern of the M's about O. The latter pattern may be chosen arbitrarily subject to the: 
condition that under translation the M’s mutually exclude one another from Q. 


[the position relative to Q in the virtually infini 
Since the offspring form a single compact cluster, i 


one another. For example, a group of lepid 
persist as a gregarious band. 

In the more general case, each centre gives rise to a ‘pattern’ of compact clusters suffi- 
ciently well spaced from one another for it to be impossible for a quadrat of the size selected 
to include more than one cl 


luster from the same centre. Such might be the case, for example, 
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where a plant reproducing vegetatively sends out rhizomes, from which at well spaced 
intervals (e.g. nodes) a variable number of aerial shoots arise close together. The patterns 
of clusters associated with the different centres need not be identical, but it is easy to see 
that, if the number of clusters per centre is not fixed, it becomes necessary for cluster sizes 
belonging to different centres to be independent random values from the same popula- 
tion p(n). 

Neyman's model for the two-parameter type A distribution, considered in the strict 
sense, can therefore be reduced to a comparatively simple situation, which readily lends 
itself to treatment by generating functions, as for example in Feller (1943) or Skellam (1952). 
For if G(z) is the p.g.f. of the number of centres which contribute a cluster to Q, and if I 
is the p.g.f. of the distribution p(n), then the p.g.f. of the distribution X, the number of 
offspring in Q, is given by Watson’s theorem as G(g(z)). 


8. Becauseofthe very special kind of distribution of progeny required to satisfy equations 
(1) and (2), the question arises as to how it can happen that the type A distribution fits 
such a wide range of ecological data so well. The reason for this circumstance seems to be 
that, whilst the dispersal of the progeny about their birth place is continuous and the 
probability distribution of their locations at the time of the census is often likely to be 
represented by a continuous surface with a maximum in the neighbourhood of (£, 7), the 
value of the integral (1) considered as a function of (£, ) may be quite close to the plateau 
defined by (2). The approximation will be particularly good wherever the dimensions of Q 
are substantially larger than the greatest distance the progeny are likely to travel between 
the time of birth and the census. In this case, whenever (£, 7) is well inside Q, the probability 
P(E, ) will differ from unity by only a small quantity; for (£, ) well outside Q, the value of 
P(£, ) will be virtually zero; whilst, for (S, ) on or near the boundary, P(£, 7) will be roughly 
1, the last class of cases being considerably less frequent and less important than the first. 

The above discussion suggests that, in the type of ecological situation which Neyman 
originally had in mind, the type A distribution has a better chance of fitting empirical 
distributions obtained by counting in large rather than small quadrats. 


9. Even so, the possibility that approximate solutions to the problem exist in opposite 
circumstances when Q is small and ꝙ very large is suggested by the limiting consideration 
mentioned at the end of $5. 

Consider first of all a number of exact solutions to (1) and (2), represented diagram- 
matically in Fig. 2, where Q is represented as a single square, Oi, Os, Os are randomly placed 
centres, and associated with each is a set of minute clusters, all with the same probability 
mass, The spatial extent of each set of clusters is shown by contours enclosing regions 
R,, Ry, . There is here no need to show &, Za, , though it may be remarked that. and 
R; are alike in shape and area though not in position and orientation. i 

If the populations of the individual small clusters within R; are Poisson variates all with 
parameter A, the expected number of offspring in N is AA;, so that even if the offspring were 
not in dense clusters but distributed at random throughout R; the number contributing 
to any unit square in R; would still be a Poisson variate with mean A. It may be remarked 
here that, since A is large, Neyman’s assumption on the independent dispersal of offspring 

of offspring per unit square is a Poisson variate. 


from their centre implies that the number ng per s 

The limiting case (with the implication that p(n) is Poisson) is then virtually equivalent. 
to one where the offspring from a centre O; are spread out independently and at random with 
32 
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uniform probability density throughout some extensive region R;. If the centres O, are 
numerous and at random, the number of centres contributing offspring to Q will be a 
Poisson variate, and, if the probability densities in the regions R, are the same for all j, 
their contributions to Q will all be Poisson variates with a common distribution. The 
Neyman type A distribution based on two parameters results immediately by the com- 
pounding of these two Poisson distributions as indicated at the end of §7. 


CONCLUSION 


Tt would appear that the sampling situation to which the Neyman type A is particularly 
suited is one where the organisms occur in compact clusters, an observation which may be 
equally well applied to a number of other compound distributions with p.g.f.'s of the form 
exp (A(g(z) — 1)}. The compactness of the clusters is in fact a condition implied as a hidden 
assumption in Neyman's original method of derivation. Nevertheless, the type A distribu- 
tion exhibits considerable robustness, and can be employed as an approximation in certain 
circumstances where the condition requiring compact clustering can be greatly relaxed. 


I am partieularly grateful to Prof. Jerzy Neyman for his interest in this problem, his 
Scrutiny of the argument and for his valuable suggestions on its presentation. 
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NEGATIVE BINOMIAL DISTRIBUTIONS WITH A COMMON k 
By C. I. BLISS} AND A. R. G. OWENY 


1. INTRODUCTORY 


Many biological experiments are evaluated by counts of the number of individuals per unit 
of space or of events per unit of time. When the individuals or events are randomly dispersed, 
the variance of the count, within sampling fluctuations, is equal to the mean, but in a 
natural population, the observed variance often exceeds its mean significantly. Of the 
several distributions by which this ‘overdispersion’ may be described, the negative bi- 
nomial has a number of advantages. It is defined by the arithmetic mean m and a parameter 
k measuring dispersion. It can be derived from a variety of initial assumptions. Its wide 
applicability has been demonstrated empirically. Its statistics are well known (Anscombe, 
1950; Bliss & Fisher, 1953; D. A. Evans, 1953). 

Observed counts are often compared with each other in terms of their means. These 
comparisons are more direct and unequivocal if the respective distributions have the same 
relative dispersion in terms of k. Thus, in devising sequential sampling schemes for tape- 
worm cysts in white fish (Oakland, 1950) or for spruce budworm on balsam fir (Morris, 1954; 
Waters, 1955), a common £ is an essential part of the underlying model. In studying en- 
vironmental factors modifying trawl catches of haddock (Taylor, 1953) or the survival of 
hemlock seedlings (Olson, 1954), a stable & would simplify the comparisons materially. 
Tests of insecticides and of fungicides may be judged from the number of surviving insects 
or damaged parts of plants. When treatments are applied in randomized blocks or other 
experimental designs, and their evaluation is based upon a count in each plot, counts 
transformed with an estimated common k will give the most informative analysis of vari- 
ance (Beall, 1942; Anscombe, 1949). In these and other cases, the negative binomial samples 
with a single may vary markedly in their means. 

Several estimates of a common k have been described. Beall (1942) proposed an un- 
weighted moment estimate from the counts on duplicate plots in each block, a design 
which would double the size of each block and thereby increase the experimental error. 
When the standard deviation is linearly related to the mean of each set of counts, Klecz- 
kowski (1949) has shown that the intercept of an unweighted estimate of this regression 
provides an empirical constant similar to a common k, with which the variance can be 
stabilized. Since the information in each sample is a function of its mean, Anscombe (1949, 
1950) has derived weights for computing a common £, suited to different methods for 
estimating k. Bliss & Fisher (1953) have described a maximum likelihood estimate which 


is efficient for all values of m and k but is hardly practicable when individual counts exceed 


20 or 30. 


The present paper concerns two approaches to the problem of estimating a common k, 


both through successive approximations. The first is an extension of Anscombe’s weighted 
moment estimate in terms of regression and small series. It is adapted for field studies and 
surveys where subsampling is customary. The second follows from the need for a common 
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& when transforming negative binomial counts preparatory to an analysis of variance, 
Subsampling is secondary and commonly omitted. If the estimated & is valid, the trans- 
formed units will conform to the underlying assumption of additivity. We will reverse this 
operation and estimate k as that value giving zero non-additivity by the Tukey (1949) test. 
The resulting & fulfils a specific purpose and its limitations as an estimate are here con- 
sidered secondary. 


2. THE MOMENT ESTIMATION OF A COMMON k 
(2-1). Derivation of a regression method. The moment estimate of k for a single negative 
binomial distribution has long been computed as 
kı = u*[(s1 — u), (1) 
where w is one of N individual counts and 9 and s? are their observed mean and variance, 
respectively (Fisher, 1941; Anscombe, 1950). Alternatively, we start with the two statistics 


* u e and / = 62. (2) 
Their expectations are given exactly by 
: E(x!) = m, Ely’) = m*{k. (3) 


Thus, y' —2'/k has zero expectation. 
For a single sample, we have the ratio y' |x’ as an estimate of 1/k. The efficiency of this 
estimate is the same as the efficiency of k, above, whose large sample variance (Anscombe, 


1 * 
954) de 2k(k +1) (m - kj 
T E. (4) 
correct to order 1/N. The large sample variance of y' |x’, therefore, is 
„ REL 2 
voe = ED. (5) 


The use of 1/k as the parameter for estimation (with estimate // in the case of a single 
sample) has several advantages. As noted by Anscombe (1950), its bias is small, being in 
fact of order 1/N?. The statistic kı or its modified form z' [y (which has the same efficiency 
as k,), however, has a positive bias which may be seriously large, namely 
2(k +1) [m+ k)? 
| (6) 
Hare to terms in I/ V). Thus, when considering the working of examples, neither k, nor 
“ly are easily comparable with the maximum likelihood estimate £ because of the large 
bias of k, and the unknown bias of f. 

When a common value for k is 
drawn from populations with different means my, the idea of estimating 1/k from the ratio 


_ 2m?(m +k)? 2b—105 
iir. ird b . (7) 
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The invariance w = 1/V is of the nature of a weight, which may be written as 


=- CN- l 
= KE1)- (2E— 1)/N—3/N? m*(m + k} N 


If calculated by replacing m by its efficient estimate i, the expectation m? by 2’, and k by 
an empirical trial value , we can consider the weighted sum of squares 


X(wiy' /K (9) 
as approximately a x?. This y* is minimized by choosing / E to satisfy the equation 
E(wz'(y' —2'/k’)} = 0, 
that is, by 1/k, = X(wz'y')/E(wz'?). (10) 


This method clearly obtains an estimate of 1/k as the slope of a linear regression of y' on 
x’, the regression line being constrained to pass through the origin (z' = 0, / = 0). An 
initial plotting of the regression suggests a possible trial value of 1/k, determined as the 
slope of a line through the origin fitted by some simple procedure. Further, it enables a 
preliminary assessment of the heterogeneity of the various samples in respect to k. 

When successive approximations have led to an estimated 1/k, which differs negligibly 
from its last trial value, the calculation provides a x? test for the homogeneity of k in the 
different samples. Since the theoretical regression line is constrained to pass through the 
origin, the observed value of 

* = Z(wy’?) LM Tux). (11) 
may be compared with the tabular y? distribution. (Here and elsewhere E*(— ) = [2(— 1.) 

(2-2). Trial estimates of k, The initial trial estimate of a common k may be obtained 
graphically or as a simple ratio. The statistics z' and y' for each of the g distributions in a 
series may be computed from , s? and N by equation (2), or directly from the totals Zu (= U) 
and Xu? of N unit counts, as 

QQU-E ad y NEw 
* A N(N-1) 
Occasional values of y' which may be near zero or negative are included, of course, with 
the others, each with its proper sign. p. 

The diagram is fitted provisionally by a straight line passing through the origin with 

a slope selected graphically or approximated by 
b = Mk = Xy'[Ex". (13) 


When z, or wa’ in the later calculation, does not differ markedly among series, this un- 
weighted estimate is often a good approximation to 1/k,, computed om equation (10). 
For a graphic test of apparent non-linearity in the regression, the ratio / for each sample 
may be plotted against its mean l. The plotted points should agree substantially with a 
horizontal line if the entire series of distributions can be described with a single keo. This 
diagram also aids in spotting gross outliers. Since the ratios y'|x' are needed later in the 


analysis, enough places should be recorded to avoid subsequent rounding errors. The sum 
another estimate of the provisional k 


of these ratios, g in number, provides still 
* =, L (14) 
of value when zi varies excessively. 


(12) 
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The first provisional estimate is illustrated by distributions from ‘plots’ within subareas, 
or ‘blocks’. These have been superimposed upon Beall’s (1939) counts of adult Colorado 
potato beetles (Leptinotarsa decemlineata Say) in each two-foot unit of row in an untreated, 
heavily infested field of potatoes. Each of sixteen subareas or blocks was divided into eight 
plots, each two rows wide and 10ft. long and separated by single guard rows as would be 
customary in a field experiment. With u equal to the total count on the ten units in each 
plot, 5 and s? (Table I) were computed for each subarea from its eight plots (N = 8), leading 


Table 1. Estimation of k, in eight plots (N — 8) within sixteen blocks, from counts (u) of 
Leptinotarsa decemlineata in a uniformly treated potato field (Beall, 1939); x’ and / 
computed by equation (2) 


u | s a y „ |x’ (u 4- k^ wa’ wy" | 


75755. 539-07 5,670-7 463-32 0-08170 7,972-7 0-07623 0-006228 
84-00 627-14 6,977-6 543-14 0-07784 9,514-1 0-06388 0-004972 
42-25 257-93 1,752-8 215-68 0:12305 3,1125 0-19526 0-024027 


32-50 71:43 1,046-6 44-93 0-04293 2,119-7 0-28671 0-012308 
48-00 172-57 2,282-4 124-57 0-05458 3,787˙2 0-16047 0-008758 
48-62 91-12 2,352-5 42-50 0-01807 2,863-9 0-15729 0-002842 
66-50 737-43 4,330-1 670-93 0-15495 6,406-4 0-09486 0-014699 
56-75 71-64 3,211-6 14-89 0-00464 4,940-7 0-12301 0-000571 
64-88 101-27 4,196-8 36-39 0-00867 6,149-7 0-09882 0-000857 
52-12 417-55 2,664-3 365-43 0-13716 4,311-2 0-14097 0-019335 
40-12 137-55 1,592-4 97-43 0-06118 2,879-4 0-21106 0-012913 
26-62 48-55 702-6 21-93 0-03121 1,612-8 0-37682 0-011761 
34-12 252-41 1,132-6 218-29 0-19273 2,271˙5 0-26755 0-051565 
34.12 14-70 1,162-3 —19-42 |—0-01671 2,271˙5 0-267556 — 0004471 
13-88 45-84 186-9 31-96 0-17100 751-9 0-80827 0-138214 


37-62 167-70 1,394-3 130-08 0-09329 2,617:3 0.23220 0-021662 


Total| 757-85 | 3,759-90 40,656-5 3,002.05 123629 — 356095 0:326241 


* = 40,656/3002-05 = 13-54 (eqn. 13). wa’ = 607-74/(u 4- k^ : 
4 . 13). = . +k’)? (eqn. 16). 
Sunk = 1 0 X(we'y’) = 410-3434, k, = 13-507 (eqn. 17). 
= 49- » Bo = 30-3806 (eqn. 21). y? = 19-214 (eqn. 11 n = 15. 
a 0:0063082, C = 16-8722 (eqn. 23). Ud ^i 
[wa:’?] = 3532-265, [w2^y^] = 226-1819, [wy’?] = 32-7221 (eqn. 22). B = 14-4831 (eqn. 24). 


Effect of 


D.F. M.S. F 
Slope, 1/k, 1 
; 30-3806 6 
Mist io intercept against 0 1 0:9747 te 
rror 13 1-4030 — 


Ik= 0:0740 + 0-0134, t = 1-960 at P = 0-05. 
Confidence limits: for /, 0-10036 and 0-04771; for kes 9-964 and 20-960. 
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to 2’, y', and %%“ in the next three columns. In Fig. 1, / has been plotted against x’ and 
fitted provisionally with 1/k’ = 3002-05/40,656-5 = 0-0738 (equation 13). No untoward 
trend is evident, either in Fig. 1, or in the plot of / against % in Fig. 2. 

Distributions with a variable N are represented in Table 2 by the trawl catches of haddock 
on Georges Bank over three summers (Taylor, 1953). The number of tows (N) at each of 
three depths in six subareas varied more than tenfold. Three series have been omitted, 


700 
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0 1000 2000 3000 » 4000 5000 6000 7000 
Fig. 1. Regression estimate of k, for the distribution of the beetle Leptinotarsa in eight plots within 


each of sixteen blocks (Table 1). The slope of the weighted regression is 1/k, = 0:0740 + 0:0134. 
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Fig. 2. Relation in each block of Table 1 of the 0 d 
e the mean number of beetles (u) per plot. 
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two which gave no information regarding k (z' = y’ = 0), and one with a single very lan 
catch for which y'|z' = 6450. In the remaining 15 series the number of tows varied fron 
4 to 47, a plot of against @ revealing no marked trend either up or down. 


Table 2. Estimation of k, from trawl catches of haddock (Taylor, 1953), 
N variable; x’ and / computed by equation (12) 


i Í i | 
Subarea | i l | | 
depth | N | U = Xu | Sut N(N - 1) z’ v ü yix 
E . 

GI 10 556 58,760 90 | 2,782-0 3,0384 55-60 1-0922 
11 10 193 6,017 90 | 347-0 335-4 19-30 0-6784 
II | 42 1,159 | 88,997 | 1,722 728-4 1,363-0 | 2760 1-871 

HI 15 1,158 527,430 210 3,874-0 31,2108 77-20 | 8-056 
II 13 693 169,117 | 156 1,994-4 10,961-3 53-31 5-4960 
HI | 29 414 17,542 812 189-5 401-1 14-28 21108. | 

J 1 4 247 30,017 12 2,582-7 | 4,859-8 61-75 1-8817 | 
I 26 | 5,987 4,477,491 650 | 48,256-4 | 123,7245 | 230-27 2-5639 
m | 19 345 9,529 342 320-2 163-2 18-16 0-5097 

MI 18 833 186,797 | 306 | 1,6572 8.6742 46-28 | 52343. 
II | 41 | 11,808 | 28,736,600 | 1,040 | 67,495-3 | 633,109-4 | 288-00 9-3801 
HI II 45 515 110 13-18 35-0 4-09 2-6555 

NI 16 1,395 1,017,147 240 3,870-3 59,614-2 87-19 | 15-4030 | 
II 38 2,003 685,893 | 1,406 4,331-2 13,650-1 68-50 3-1516 

on 47 775 65,833 | 2,162 247-4 1,136-9 16-49 4-5954 

— |138,689-2 | 892,177-3 — 64-6861 | 


¥=15 * : = . 0-001399(N— 1) 
/(64-6861) = 0-23 (eqn. 14). 4 0-2839 0.54 — 3JN* (eqn. 15). 
E(wz^?) = 1:2831, X(wz^y) = 5:3178, ke = 0-2413 (eqn. 17). 
E(wy'*) = 35-204, Bà = 22-040 (eqn. 21), y? = 13-224, n = 14. 
I/k, = 4-145 + 0:8830, t = 1-960 at P = 0-05. 
Confidence limits: for 1 Ke, 5:8759 and 2-4143; for kes 0-1702 and 04142. 


175 sampling units of soil in each of twenty-four fields gave the means (T) and variances (8?) 


k to this series might well be questioned. 
(2:3). The weighted estimate of k,. Since the component distributi ibute 
equally to the estimation of k, as ponen utions do not contribute 


re 


4 050. 1) 
E- I/ -3]NV (15) 
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If N is the same for each distribution, A is a constant; if N varies, E or N may be large 
enough to warrant omitting the term 3/N?. In any case, we" for each distribution is obtained 
directly as 


v! = AJ E^). (16) 
Summing the products of wz' by & and of wz" by y' leads to the estimate 
k, = NO QNC) (17) 


Table 3. Analysis of counts of wireworms (Limonius) in soil from N = 175 square-foot 
units in each of twenty-four irrigated fields in southern Washington (Jones, 1937) 


Field no. a e" r d y | 
| i - a 


i 
| 

m — — ra DRE 3 
| 
| 


vie 


— À" 


rovisional k' — 1:93. 
%% = 0-4231 (eqn. 13). 1% = 0-6131 (eqn. 14), p EE b ced 
econd approximation: E(wz*) = 2326-28, E(wz'y') = 1118-63, k= j ; 
3 T 7425, J. = 99-06, n = 22 (eqn. 11). 
Unweighted regression of y'/z' on logi: b = —0-626 + 0-091. 


Effect of 


Slope, 1/k, : 
Computed intercept against 0 
Error 


44 Negative binomial distributions with a common k 
If k, should differ appreciably from its trial value k’, the columns of (U) and of ur, are 
redetermined, with the initial k’ in equation (16) replaced by E. This may have a relatively 
small effect upon the estimate of k, but is essential for a valid test of homogeneity. 

These steps are illustrated in Table 1 for N constant and in Table 2 for N variable. When 
the entries of wz' are relatively stable with few varying more than fivefold, the computed 


60; 


f 
30 
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Fig. 3. Regression estimate of k, for the distribution of wireworms in 175 sampling units in each of 
twenty-four irrigated fields (Table 3). The slope of the weighted regression is 1/k, = 0-481. 
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Fig. 4. Relation of the observed 1 [kı = y'[z' in each field of Table 3 to logi, where n is the mean 
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estimate (k, by equation (17)) usually differs but little from its unweighted trial value (k’ by 
equation (13)). Thus in Table 1, k’ = 13-54 and k, = 13:507, With large differences in i, as 
in Table 2, equation (14) may give a better trial value (“ = 0-232) than equation (13) 
(k' = 0-155), as confirmed by the weighted estimate, k, = 0-2413. 

If we ignore the apparent trend of / in Fig. 4, we may compute a common k for the 
wireworm counts in Table 3. Here the harmonie mean (k’ = 1-93) of the trial values defined 
by equation (13) ( = 2-364) and by equation (14) (k’ = 1-631) led to the initial weighted 
estimate, k, = 2-069. A second approximation gave k, = 2-080, a change of only 0:59; 
from the first weighted estimate. 

(2:4). Grouped distributions. Cases arise with relatively many distributions or samples, 
but sometimes with few individual counts in each component sample, as when the number 
of survivors in each plot of an insecticide test is recorded in two or more equal subplots. 
Thus, in the insecticidal experiment upon leather-jackets reported by Bartlett (1936), the 
number of survivors was counted in two subareas on each plot. Grouping facilitates their 
analysis by reducing the number of entries that need be handled. We shall consider only 
the case where the number of counts N is constant in each sample. 

The component samples are grouped into approximately equal intervals on the basis of 
the total count U in each plot. Although an average z' and / is determined for each grouping 
interval, a separate z' and y' need not be determined for each individual sample. For 
N >3, equation (12) is solved for each grouping interval with four terms: the number of 
samples f, the total count ZU in the f samples, the sum of their squared totals X U, and the 
pooled total sum of squares of the individual counts u or ZYXu?. Then 


us U Tus 
fO -1) 115 
4 , Nu LU (NY) 
zi fo n S Dee 
When N = 2, these equations simplify to 
„ DBU- Ed „ Ad U 184 
8 and 3 27 7 (18a) 


where d is the difference between the two counts in each sample. Jum 
The weight for each 2’ and / is that given in equation (16) multiplied by f. For each 


grouping interval, the mean count is z = XU/(fN) and 
MN ia (19) 
(u+k'? 
When N > 3, a trial estimate of k equivalent to that in equation (13) may be computed from 
the totals of each term in thenumerators of and y’ in equations (18) or (18a). Alternatively, 
if @ varies markedly, equation (14) may be modified to 
ki I (20) 


The weighted estimate is frequently intermediate between these two trial values. E 
The estimation of k, from grouped distributions is illustrated in Table 4 with the initial 
potato beetle counts within each of the 128 individual plots summarized in Table 1. xi 
plot consisted of N = 10 initial counts and all distinctions between blos 4 e 55 \ 
Eu S ae 
\evre Sie 
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disregarded in grouping the plots. The computed k, = 5-070 agrees well enough with 
provisional E = 5-13 by equation (18), solved with the totals for the series, to need onh 
one approximation, 


Table 4. Estimation of k, within 128 plots of N = 10 initial counts of Leptinotarsa, from 
same data as Table 1; plots grouped by size of total count, U = Su, with f plots in e 
grouping interval 


U f ZU | Eu xU: a’ y a V. 
0 | A E 171 801 1-000 | 04286 L014 | 0-4286 
18-22 7 146 496 3,052 | 4057 0.9429 2-086 | 0.2324 
23-27 9 225 957 5,043 5-785 2-48 | 2.500 | 0-4059 
28-32 19 | 571 | 2,007 | 17,193 8-530 2-186 3-005 | 0-2563 
33-37 10 | 353 | 1,737 | 12477 11-933 1907 | 3-530 | 0-1598 
| 
I 
38-42 11 444 2.432 17,934 15-658 2414 4-036 | 0-1542 
43-47 4 183 | 1,071 8,377 20-29 1-906 4-575 | 0-0939 
48-52 13 | 645 | 4447 | 32,097 23-57 5-674 4-962 | 0-2407 
53-57 8 | 440 | 3,42 24.216 29-27 4-506 5-500 | 0-1539 
58-62 "m 653 | 5,097 | 38,779 34-09 5:671 5:936 | 0.1664 
63-67 8 | 523 | 4,111 | 34,203 41-79 3-056 6-538 | 0-0731 
68-72 5 | 352 | 3,262 | 24,790 47-84 10-360 7-040 | 0.2166 
73-82 6 | 469 | 4,719 | 36,699 59-22 11-611 7:817 0.1961 
83-92 3 | 262 | 3,058 | 22,902 73-50 19-704 8.733 0.2681 
93-102 5 | 484 | 6,144 | 46,874 90-51 22-689 9-680 | 0-2507 
| 3-132 2 | 242 | 3,016 | 29380 | 14313 25:567 12-100 | 0-1786 
ET d 
Total 128 | 6,063 | 46,997 | 355,347 2a 2x = 3-4753 


k’ = 308,350/60,056 = 5-13 (eqn. 18 solved with totals). wz' = 102.21 4% / uU +k’)? (eqn. 19). 
(was) = 2708-528, X(wzy^) = 546-1114, k, = 5-070, 

2(wy'*) = 121-0497, Bj = 107-7243, y? = 13-3254, n = 14. 

. = 0-1973 + 0-0190, t = 1-960 at P = 0-05. 

Confidence limits: for 1/k,, 0-2345 and 0-1600; for ko 4-26 and 6-25. 


For a second example, grouping may be applied to the lesion counts on each half of each 
of the first two leaves in 120 potted bean seedlings, as reported by Kleczkowski (1949) in 
tended to decrease as the mean increased. The provisional * = 16:3 from equation (20) 
gavea computed k, = 17-82 and the next approximation % = 17-90, which is near the lower 
of the limits of 15 and 84 given by Kleczkowski for his unweighted intercept estimate of 


The abbreviated caleulation with equation (18a) for two counts in each plot is illustrated 
in Table 6 with Bartlett's (1936) leather-jacket counts from an insecticide experiment in 
randomized blocks, The differences between the two subplots in each plot have been grouped 
on the basis of the plot totals U (Table 7), ignoring differences between treatments and 
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between blocks. Starting with the unweighted estimate of E = 14-26, weighting gave 
ke = 19-00, which was changed in a second approximation to k, = 18°32. 

(2-5). Tests for agreement with a single E. Agreement with a single k, may be judged 
initially by an overall x., which follows from the weighting of each contribution inversely 
as its variance. A column of wy’ is added to the work-form, most easily as the product of 
(uz) ; = wy’, and the products of y’ x wy’ are summed over the g distributions in the 


Table 5. Estimation of k, with grouping ; numbers of lesions of tobacco necrosis virus 
on N —4 half-leaves of 120 bean seedlings (Kleczkowski, 1949) 


| 
| From initial counts u Variates 
Class No. of E . | Lesions | — ED -* - jul 
limits | plants | | per leaf | | lr 
ü Í | XU =U? Eru l ü a’ y 
|| 
; | I 
I 
as 4 | 246 17,066 5,048 15:38 250-4 | — 49525 0.19868 
35 3 | 330 30,324 9,966 27-50 732-2 70-83 0-096074 
45 4 | 671 113,183 30,833 41-94 1,715-6 169-50 0-09880 
55 6 | 1,204 242,472 65,570 50:17 2,457 224-9 009153 
65 11 2,634 632,576 167,428 59-86 3,524 | 221-5 006285 
10 2,769 768,337 206,323 69-22 4,683 405-4 0-08657 
15 
x 15 4,847 1,567,991 408,801 80-78 6,440 292-6 0-04543 
| 95 10 3,597 1,295,857 335,637 89-92 8,002 299-2 0-03739 
105 9 3,547 1,398,947 373,365 98-53 9,496 776-6 0-08178 
115 4 1,722 741,506 193,922 107-62 11,408 604-5 0-05299 
| 125 4 1,907 909,229 237,041 692-0 0-04941 
9 4,635 2,387,721 615,801 0-03475 
n: 6 3,336 1,856,240 483,038 0-04799 
155 8 4,818 2,902,168 755,832 0-04971 
175 7 4,550 2,960,868 762,440 0-03422 
205 6 4,504 3,388,422 865,206 0-02334 
275 4 3,801 3,635,873 948,159 0-05408 
24,854,780 6,464,410 


First estimate: trial k’ = 120/7-38161 = 16-3 (eqn. 20). A = 386-59 (eqn. 15). 
X(wa/2) = 32,042-0, E(way) = 1798-02, E. = 17-821 (eqn. 17). 
Second estimate: A = 463-24, X(wa/?) = 37,339-3, X(wa’y’) = 2085-46, E. = 17-905 (eqn. 17). 
X(wy/?) = 141-204, BF = 116-476 (eqn. 21), y? = 24-729, n = 15 (eqn. 11). 
Ew = 0-00879575, u] = 31,439-3, [wzy] = 1495-26, [wy’?] = 82-1657 (eqn. 22). 
B? = 71-115 (eqn. 24). C = 59-039 (eqn. 23). 


Effect of 


116-476 
13-678 13-678 
11-051 0:7893 


Slope, 1/k, 116-476 


Computed intercept against 0 
T 
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series to obtain X(wy'*). Part of this total, with 1 D.F., can be attributed to the slope of 
line fitted with a zero intercept, f 

Bj = X*(ez'y )/E(wa'*). (2 
The remainder is an approximate x, X = X(wy'*) — BZ (equation (11)), nominally 
g-2v.¥. Unlike k., x? is relatively sensitive to a discrepancy between the estimated | 
and the provisional k’. Before calculating the column of wy’, these should agree su 
tially, which may require computing an additional cycle. 


Table 6. Estimation of k, from subplot counts (N = 2) in an experiment in randomized 
for the control of leather-jackets (Bartlett, 1936); for plot totals U, see T'able 7 
Differences between u within plots, d 


96 of toxicant 


EU? = 85,256 (from Table 7), Ed? = 4176. 


Estimation of k, within plots from grouped data 


Second wa’ | 

5 3-10 9:40 0-400 0-04255 1-6888 1:8492 

9 6-83 45:89 — 0-111 — 0-00242 2-0598 2-4366 

20-29 7 12-36 144-1 9-00 0-06244 1-0056 1-2858 

30-49 5 21-90 475-6 — 2-40 — 0-00505 0-3893 0:5399 

50-69 6 30-08 859-8 70-00 0-08141 0-3107 0-4499 
80-99 2 43-25 1792-5 128-0 0-07141 0-0616 
120-139 2 63-50 3839-5 346-5 0-09025 0-0337 


First estimate: k' = 81,080/5684 = 14-26 (eqn. 18). wa’? = 101-80f/(u +k’)? (eqn. 19). 
„„ lu) = 947-40, Sd 7) = 49-874, k, = 18-997 (eqn. 17). 
Second estimate: k' = 19-00, wa’ = 180-63f/(u 4 k*, k, = 1329-09/72-567 = 18-315; x? = 1-875, 
n = 5 (eqn. 11). 
Confidence limits: for /e, 0-05460 + 0-05376; for ko 9-23 and 1190. 


For a second test of validity, an intercept component with 1 P. F. may be split off from 
the approximate X. It measures the difference between two straight lines, one fitted with. 
and the other without the constraint of a zero intercept; the remainder provides an empirical 
error for testing the significance of the difference, An additional term is needed, the sum 
of the weights Xw, which may be obtained by successive cumulative division of wa’ by a’. 
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The weighted sums of squares and products for the g series are then reduced to deviations 
about their means, rather than about zero, by computing 
luer I = X(wz^*) — X*(wz")/Swe, 
[wry] Tur - (z^) U is, 
and [wy'?] = X(wy'*) C, 
where € = Xt(wy')/xw. (23) 
The variation attributable to the slope of the line passing through the origin, Bj, is defined 
in equation (21); that accounted for by a slope without this constraint is 
B? = [wz'y' ]*/[wz"t]. (24) 


The required test may be arranged as an analysis of variance: 


| Effect of D.F. 8.8. MJ. | Fr | 

2 — f — ee 

Slope, 1/k, 1 B B Bust | 
Computed intercept against 0 1 CB. I, Is* 

Error g-3 [wy] — B* s a | 


If a single k, is justified, the F value in the first row should be clearly significant and that 
in the second row not significant. A significant F in the second row indicates a progressive 
change in k. The sum of squares for error is in effect an approximate x? test of the homo- 
geneity of the component distributions, after allowing for a linear trend of y' on z' with a 
non-zero intercept. 

Both tests are illustrated in Table 1. By cumulative multiplication of the columns for 
y' and wy’, (wy?) = 49-5943, of which Bg = 30-3806 could be attributed to the slope of the 
fitted line, the remainder being x? = 19-214 with 14 b. r. k, could be accepted at once as 
valid, but since the approximate x? exceeded its degrees of freedom, the zero intercept has 
been tested. By accumulative division of each wz' by its corresponding x, we obtain 
Ew = 0-0063082, required in computing [12/2], [wx'y'] and [wy'*] with equations (22) and 
from them B? = 14-4831, the variation attributable to a slope not constrained to pass 
through zero. The resulting analysis of variance at the bottom of the table reveals no 
trend which would make a zero intercept untenable. 

The catches of haddock (Table 2), the number of potato beetles within plots (Table 4) 
and the subplot counts of leather-jackets (Table 6) were equally consistent with a common 
be, as judged by their respective xs. In contrast, the wireworm counts in Table 3 and the 
viral lesions in Table 5 were characterized by significant non-zero intercepts, with the 
residual variation far in excess of expectation for the wireworms, but well within the 
sampling error for the viral lesions. In both examples, lk decreased 1 r 
linearly as log u increased, the unweighted regression in Fig. 4 accounting for 68 % e a 
variation in 1/k,, with b = — 0-626 + 0-091, and in Fig. 5, where each point was weigh: y 
the number of seedlings (f), for 67% of the variation, with b = — 0:111 + 0-020. d 
log (1/k,) was plotted against log u, the two linear regressions accounted sey e 
66 and 62% of the variation, but were sufficiently sensitive to small values of I/, that the 


counts in Table 3 for fields nos. 14 and 16 with similar s have been combined. Neither 
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regression in Fig. 6 differed significantly in slope from — 0-5 (b = — 0-409 + 0-065 for wire- | 
worms and 6 = —0-629 f 0-127 for viral lesions). Either regression would discredit the 
suitability of a single, unqualified E. 

Heterogeneity in k may be due to other sources. In data from an insecticidal field test, 
the treated series may have a relatively stable æ that differs from that for the untreated 


015 
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log v 
Fig. 5. Relation of the observed 1 [ky = y’/a’ to logu from the grouped distributions of viral lesions in 


5, where u is the mean number of lesions per half. leaf in each grouping interval. 
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Fig. 6. Relation of 1 |k, in logarithms to log for the data in Figs. 4 and 5, The fitted regressions (solid 


lines) do not differ significantly from a slope of b = 0-5 (broken lines). The shaded circle in the wire- 


worm plot represents two fields and has been given double weight in fitting the line. 
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controls but can be used in comparisons of the treated plots, Quadrat counts of hemlock 
seedlings in plots exposed to sunlight during part of the day had a different & from those in 
total shade (Olson, 1954). In some cases, heterogeneity can be traced to an occasional 
*wild' value, which, if justifiably an outlier, may be omitted in computing k. When the 
heterogeneity persists and a single E is warranted by the F test for B}, the weighted k, may 
be no better as an estimate than the harmonic mean of the individual unweighted k's. To 
balance the weighting appropriate for a homogeneous series and equal weighting, semi- 
weights or partial weights may be considered (Cochran, 1954). 

(2-0). Precision of k. Since the distribution of 1/k, is the more nearly symmetrical, 
the standard errors of k, are computed in terms of 1/k,. By usual regression theory 
we have for the variance of the slope 


V(1/k,) = 1/E(wz^*), (25) 


when the data are consistent with a common k as judged by *. If y? with n p.v. exceeds its 
expected value at, say, P «0-1, but the regression meets the test for a zero intercept, an 
approximate variance may be computed as 


| VOK) . (254) 
although the weights used in computing 1/k, are now of doubtful validity. If its variance is 
computed with equation (25), confidence limits for 1/k, are determined with the square root 
of the above variance multiplied by the normal deviate (i.e. Student's t for n = oc) for the 
selected level of probability, and then inverted to obtain the limits for . If the variance of 
the slope is increased by x?/n (equation (25a)), Student's t is that for the degrees of freedom 
in x”, and the limits are approximate at best. Confidence limits for homogeneous series are 
illustrated in Tables 1, 2, 4 and 6. 


3. A COMMON k FROM A TEST FOR ADDITIVITY 


Not infrequently, the results of a field experiment are recorded as counts and evaluated by 
an analysis of variance. Although sometimes computed in terms of square roots, which 
would stabilize the variance of a Poisson-type count, the data are commonly over-dispersed 
and need a transformation appropriate to the negative binomial. This requires an estimate 
of k,. Most experimental designs, however, lack the basis for a regression estimate of ., 
80 that we need a new approach. JJ 
e Two transformations have been proposed for stabilizing the variance of negative binomial 
distributions, both depending upon an estimated common k. The simpler of these and the 
only one considered here, is the logarithmic transformation described by Anscombe (1948), 
where y = log (u + 4k). A somewhat more effective transform is that given by Beall (1942), 
y = sinh? Jujk, which he has tabled for different values of u and of 1k. When k is 
known, either transformation should stabilize the variance effectively and, judging from 
experience, lead concurrently to additivity. Of the two objectives, additivity is often 
. deemed the more important and, moreover, can be tested conveniently. This i 
reversing the usual order, and by successive approximation, selecting that value of I giving 
] -additivity in terms of the log-transform. 
— Spree ee is that described by Tukey (1949, 1955) for atype 
of systematic non-additivity occurring in experiments in randomized blocks, eae 
and other designs. In a cross-classification, for example, a mean square for non- i y 
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with 1 D. x. is isolated from the interaction of rows by columns and compared by an F ort 
test with the mean square from the remaining interactions. The non-additivity represents 
the regression of the random element in each cell of the table upon the product of two 
deviations from the general mean, that of the column mean and that of the row mean. If y 
is an individual measurement, each random element e is defined as e = y — Jp- y, 4- y, 
where jj, is the mean of its block, 7, of its treatment, and 7 is the general mean. The corre- 
sponding produet of the deviations of the two means for each cell is computed as 
x = (yy— y) (y, ^ y). The variation accounted for by the slope, B’ = [xe]/[z?], is the sum of 
squares for non-additivity, B} = [xe]*/[2?]. 

Plotting each e against z enables one to see what is happening, but by sacrificing this 
information, we can simplify quite materially the calculation for randomized groups or 
blocks (Bliss & Calhoun, 1954). The standard table of y with its usual marginal totals, 
T, for the f blocks and I for the A treatments, and its overall total of 7’ = Xy for N = hf 
counts, is augmented by a column either of X(7, y), or of (1/5), for calculating the sum of 
products X(7,7,). The total of this extra column of products is equal to ET? or to ET? 
respectively. The remaining caleulation is primarily that for a standard analysis of variance 
for randomized blocks. It is summarized in the following work-form, where 


S, = X(T, Ty) - T (Sp 4- S,4- Cn). 
The regression of e upon the a then has the slope B. = S*/S,8,. 


Row Term D.F. 8. S. M.S. F 

1 Blocks fal ETih—Q, = 8. = — 

: 5 h-1 T. C =S. = = 

on-additivity 1 S2/(S,S,N) = BI Bi Bis? 

4 Error N -A s l $ s a 

5 Total N-1 Ey! C = [y] S, = X(T, Tiy) 

6 Correction 1 TiN =O, EB 
The short test for non-additivity has been applied in Table 7 to the initial counts per plot 


of leather-jackets, which are designated as Xu = U in the regression analysis of Table 6 
but for this purpose will be called y. The sums of products 


(fiy) = 92 x 501 +66 x 376+ ... +25 x 59 = 80,858. 


As a second example, eight dummy treatments have been assigned at random to the eight 
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plots in each block of the data underlying Table 1 and tested for non-additivity, substituting 
the original counts (u) of the potato beetles in each plot for y in the work-form. With the 
randomization tested in Table 8, the original counts did not meet the requirement for non- 
additivity. Ten additional randomizations of the same original counts gave variance ratios 


Table 7. Test for non-additivity applied to the total count U (= y) 
of leather-jackets in each plot (Bartlett, 1936) 


y at each % of toxicant 


Block " 
no T, ECT) 
0 0 0.2 0:4 0-5 0-6 
= - £ i 
1 92 66 19 29 16 25 247 80,858 
2 60 46 35 10 11 5 167 56,495 
3 46 81 17 22 16 9 191 61,300 
4 120 59 43 13 10 2 247 93,059 
5 49 64 25 24 8 7 177 57,345 
| 8 134 60 52 20 28 ll 305 105,127 
MIU 501 376 191 118 89 59 1334- T 454,184 — ET? 
| — COSO 


S.S 0, = 78,055, S, = E(T,T,y)— T(Sy . S. . Om) = 2,188,160, NS, S. = 2.229, 810.016; 
U = y = 1334/36 = 37-06, x’ = 1367-6, y = 173-54, E, = 7-881. 


Row Term D.F. 8.8. 
1 Blocks 5 2,358 8. 
2 Treatments 5 26,265 = 8. 
3 Non-additivity 1 2,147 = B. 
4 Error 24 5,054 
5 Total 35 35,824 
6 Correction 1 49,432- C, 


m 


for non-additivity ranging from F = 0-08 to 8-54, with a median F = 2:32. In the absence 
of real differences between treatments, the test is relatively sensitive to chance variations 
in the arrangement of plots within blocks. 

(3-2). Provisional k for transforming counts. Given the above test procedure, the required 
+k may be computed, by successive approximations, as the value giving zero non-additivity 
when the count in each plot (u) is transformed to y = log (w+ $k). Occasionally, the trial 
value for starting the iteration can be based upon past or concurrent experience with other 
similar tests, or a well-founded a priori k accepted without change—if the non-additivity 
is no larger than the residual error. More often, the initial k’ must be computed from the 
evidence of each experiment. Two provisional estimates may be suggested. 
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For the first, separate counts are recorded on two or more equal subsections of each plot, 


leading to a trial & based upon subsampling within plots. An intraplot I can then be com- 
puted by regression, with grouping where desirable, as has been described. Further analyses, 
however, are based upon the sum of the subsamples in each plot, so that we require not an 
intraplot but an interplot I.. Their relation depends upon the correlation between adjacent 
subplot units. If half-plots, say, were completely correlated, so the r = 1, both k’s would be 
the same. This condition was approximated in counts of Lespedeza in an old meadow 
(F. C. Evans, 1952). When the same counts were combined successively in quadrats doubling 


Table 8. Test for non-additivity of the original plot counts u from the uniformity data in 
T'able 1 when dummy treatments 1-8 were assigned at random to plots in each block 
Analysis of variance 


Term D.F. 8.8. M.S. F 
- S j ¿l 
Blocks 15 41,840 2,789·3 12-64 
‘Treatments 7 1,791 255-9 1:16 
Non-additivity, Ba 1 1.572 1.572 7-12 
Error 104 22,957 220.747 — 
Total 127 68,160 — — 
Correction 1 287,187 — — 


S,-- S,--C,, = 330,819, S, = 3,883 


407, S, & N = 9,594,121,743; U = 6063/128 = 47-367, x’ = 2241-9, 
„ = 173-37, k, = 12-931. 


approximate upper limit to the k, between plots. : 
In the counts of Leptinotarsa, k, = 5-07 within 


Nk, = 10 x 5-07. The observed k, between plots (13-51) falls within 


t : the mean square for the residual error gives à 
Wale 8 , and all of the observations an overall mean V, from which we can compute 
x and / in equation (3) and a trial value of ki 


fro: : . = 47-367, 

hes sd Nes of variance, s? = 220-74, we find, by equation (2), ky = Ä = 12:99, 
which compares avourably with the regression estimate from th , 1 
(Table 1). Unlike Bi for non-additi m the same data of k, 


vity, this estimate of I proved relatively insensitive 


| 
= 
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- ee a HER 11-01 
and 13-47 in cleven dummy experiments superimposed upon the same original counts. 
Trial estimates for the leather-jacket counts in Table 7 can also be computed by two 
procedures. From the regression estimate of the intraplot k, = 18-32 and N = 2 (Table 6), 
an interplot k, between 18 and 36 would be anticipated, apart from the relatively wide 
confidence limits of k, (9:23 and 1190). Alternatively, the mean count (u and error mean 
square (s?) of the plot counts in Table 7 gave k, = 7:88, less than the lower limit predicted 


Table 9. Terms required in approximating IE for zero non-additivity with 
the transform y = log (U + 4k), from the data in Table 7 


i 
S, | 8, B, 
| | 
91-3144 2.2490 — 010891 | — 002919 
11 92-0320 2-0394 — 005929 — 001801 
14 96-3342 1-7538 004052 001437 
94-3488 1-8719 — 0-00433 —0-00142 


Table 10. Plot counts of surviving leather-jackets in Table 7 with 
metameter y = log (U + 4h) giving zero non-additivity 


= log (U + 12-6) for dose 


577-3240 


two estimates differ so widely, the regression tech- 


from the intraplot variation. When the 
trial value, say, of $k’ = 9 for 


nique would be given greater weight in selecting an initial 


starting the iterations. i 
(3-3). Estimation of k from the log counts. Whatever its source, the initial trial value » 
4k’ is added to each plot count; the logarithm of the sum is our metameter y = log (U + $k’). 
With the work-form for non-additivity, we compute the sums of squares in rows 6, 1 and 2, 
their sum (C 4- S -- Sj), Sa, and ne test criterion 
= S« (SN » 3 
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from which the value of 3% giving B, = 0 can be interpolated. Empirical trial suggests 
that the average of two interpolated estimates, one based upon 3 and the other upon 2/k’, 

is often nearer to the desired value than either direct or harmonic interpolation alone. 

The estimation of 44 by minimizing non-additivity in the transformed counts is viewed, | 
primarily, as a device for finding a suitable metameter to be used in analysing a series of 
counts. In some cases the method has led to negative values for $k’. Although the form of the 
underlying distribution may then be in doubt, the transformed data, conforming to the 
basic assumption of additivity in the analysis of variance, may serve the purposes of the 
experimenter quite as well as if 4k’ were positive. Applications of the technique to field 
experiments on insecticides have been promising, especially in one case where the weights 
for a further probit analysis of several dosage-mortality curves were based upon the 
interpolated & (Bliss, 1958). 


Table 11. Analysis of variance of transformed counts in Table 10 


Term D.F. | 8.8. M.S. F 
E 

Blocks 5 0-1384=S, 0-02768 == 1-94 
Treatments 5 1.87198 = =e — 

Control against treated 1 1-5606 1:5606 = 109-59 

Linear on dose 1 0-2882 0-2882 — 20-24 

Non-linear 2 0-0023 0-00115 — 0-08 

Dummy / 1 0-0208 0-0208 1:49 = 
Non-additivity f 0-000002 Kk ET 0-00 
Initial e:ror 24 0-3353 0-01397 1:00 — 
Total : 35 2-3456 — — erm 
Correction 1 94-3488 = O'm — — — 
Experimental error 25 0-3561 0-01424 — 1-00 


SSO, = 96-3591, S, = —0-004334, „(S, & N) = 3-05394, B, = —0-001419. 


The estimation of 3 in an experiment on insecticides may be illustrated with the leather- 
jacket counts in Table 7. As discussed above, we may start with 3% = 9 as our first trial 
value, converting each of the N — 36 counts (U) to y = log (U +9) and computing B, 
for non-additivity. The calculation is continued with successive trial values of 3% until 
with 3% = 11 and IA the corresponding B, 1 


The required 3% corresponding to B, 
ik’ = 12-67, and again between 1/11 an 


are small negative and positive values (Table 9). 
— 0 has been interpolated between 11 and 14 as 
j d 1/14 as $k’ = 12-49, the two estimates averaging 
$k’ = 12-6. The final variates, y = log (U + 12-6), are shown in full in Table 10. The resulting 


B, = —0-00142 approximates zero so closely that the sum of squares for non-additivity 


vanishes in the analysis of variance in Table 11. Two plots in each block represented 


rison between them has been added to the initial 
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To gain some idea of the variation in 44’ when determined by minimizing B, in the absence 
of differences between treatments, we have analysed similarly the uniformity data in 
Table 1, when eight dummy treatments were assigned at random to the eight plots in each 
of the sixteen blocks. In eleven independent randomizations of the same data, the fluctua- 
tion in the variance ratio for B, when computed from the initial counts u, without trans- 
formation, has already been noted. Even greater variability appeared in the values of 4k’, 
ranging in the eleven ‘experiments’ from — 4-8 to more than 120 but with a median of 
4k’ = 7-4, in good agreement with k, = 13-51 in Table 1. 

The wide range in $k’ from identically the same counts in different randomized com- 
binations indicates the sensitivity of this estimate to random variation when there are no 
real differences between treatments. Even when the plot counts differ markedly with 
treatment, quite different values of 3&' may have relatively little effect upon comparisons 
between treatments. The leather-jacket counts in Table 7, for example, when analysed with 
4k’ = 5gave F = 0-51 fornon-additivity instead of F = 0-0001 with 4k’ = 12-6, but the three 
treatment effects yielded variance ratios of F = 105-82, 24-19 and 0-26, quite similar to 
F = 109-59, 20-24 and 0-08 for the same comparisons in Table 11. In practice, an exact 
determination of the Jk giving B, = 0 is probably unnecessary. Any value for which 
Bz < s? should meet the needs of the experimental biologist. 


4. SUMMARY 


'The most widely applicable of the over-dispersed distributions, the negative binomial, is 
defined by the arithmetic mean and a parameter k. Comparisons between the means of two 
or more distributions are more direct and unequivocal if they have the same relative dis- 
persion in terms of k. Two approaches to a common k are described and illustrated with 
numerical examples. 

The first is a regression moment estimate that is applicable when the relevant variation 
can be sampled. Two statistics, x’ and /, are computed from the mean and variance of each 
component distribution, such that their ratio, y'[z' = 1/k,, is an estimate of 1 Ik and the 
difference (/ —a'|k) has zero expectation. Given two or more component een 
a common 1/ % can be estimated by successive approximations from the slope of y upon w, 
when the regression is constrained to pass through the origin and each / is weighted by its 
invariance. Agreement with a single k, can be judged by a x? test of the variation about the 
regression, by agreement with a zero intercept, and by independence between / from the 
component samples and their count means vw. i 

A second estimate of the common k is proposed for field experiments arranged in ran- 
domized blocks or other restricted designs and evaluated by an analysis of variance in terms 
of Anscombe’s transform, y = log (w+ 4h), of the plot counts . It is proposed to estimate 
the k in this transformation, again by iteration, as that value giving zero non-additivity : 
(B, = 0) in the regression test described by Tukey. A simple form of the caleulation is 
described for randomized blocks. Since the resulting k provides the biologist with an additive 
metric for analysing his experiment, its possible limitations as an estimate are considered 
secondary. 


This project was undertaken and much of it completed in 1953, when the senior author 
was a guest of the Department of Genetics at Cambridge University, on leave from the 
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Connecticut Agricultural Experiment Station and Yale University. The authors are 


grateful to Sir Ronald A. Fisher, John W. Tukey, Frank J. Anscombe and the referee for 
their suggestions and assistance. 
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SIMPLIFIED METHODS OF FITTING THE TRUNCATED 
NEGATIVE BINOMIAL DISTRIBUTION 


Bv W. BRASS 
University of Aberdeen 


1. INTRODUCTION 


The negative binomial distribution is frequently used to fit sample data. In some circum- 
stances the sample may be truncated at the lower end because the number of observations 
in the class with zero measurement cannot be isolated. Sampford (1955) gives as an example 
the distribution of breaks in irradiated cells which are at a particular stage of the mitotic 
cycle; cells not susceptible to breakage cannot be distinguished from susceptibles in which 
no break occurs. A further example from demographic research has recently been presented. 
Brass (1957) has shown that, in some cireumstances, the number of children born per woman 
in a cohort of completed fertility, where all the women have been exposed to risk, is 
distributed, to a good approximation, in the negative binomial form. In most populations, 
it is not possible to sample only the women exposed to risk and the zero class, therefore, 
cannot be accurately determined. The distribution of children per mother, however, follows 
the truncated negative binomial form. 

Sampford (1955) has given methods for estimating the parameters of the truncated 
negative binomial distribution, by the use of the first two sample moments, and also from 
the maximum likelihood equations. By these methods the parameters are obtained, in 
each case, from the solution by successive numerical approximations of two equations. The 
solution of the moment equations is fairly laborious and that of the maximum likelihood 
ones considerably more so. 

This paper considers simplified methods of fitting the truncated negative binomial dis- 
tribution. Reasonably efficient estimates, which are easily calculated, will always be 
useful (a) for exploratory work when it is not clear which type of distribution should be 
fitted, and (b) to provide first-stage values in the iterative solution of the maximum likeli- 
hood equations. In some instances, even in the final stages of an investigation, the simplified 
method will be all that is needed. Whether the extra work required to find maximum 
likelihood estimates is justified by the gain from the increase in precision can only be 
decided by a balancing of advantages in each particular case. 


2. METHOD A. MODIFICATION OF EQUATIONS FOR ESTIMATION BY MOMENTS 


In Fisher's (1941) notation, the truncated negative binomial distribution has the form 


wt (Er I) l, „ 
P(r) = 1Zw (k-i! (r 1,2, ...), (1) 


where 9 = 1—w. 
The factorial moments are 
Hj = (G1) wi -i)! 
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and the first two moments about the origin 
; ky ; _ Kn + kn) , 
A= wi-wp = Uh (2) 
We will also write for the proportion in the first class of the truncated distribution 
knw" 
Fre- (3) 


The main difficulty in solving the equations (2) for the moment method of estimating the 
parameters comes from the component w* in the above expressions. If the third moment is 
used to eliminate w* from the relations, estimates of w and kin terms of the first three sample 
moments are found very easily, as pointed out by David & Johnson (1952). However, as 
these authors emphasize, the method is very inefficient because of the weight given to the 
third sample moment. 

Elimination of w* from the equations can be achieved by many methods which do not 
involve the third moment. The simplest seems to be by the use of the proportion in the 
first class of the truncated distribution. If the expression for w* from equation (3) is sub- 
stituted in (2) we obtain the following equations for the parameters in terms of the popula- 
tions moments and P 


In wu P 
w=S-P), k= = A (4) 


where o is the second moment about the mean. 
Replacement of the moments and P by sample values leads to very simple estimates of 
wand . 
pam h_a „ Um /n 
x 3 r), * (5) 
where the bars denote the estimates, n; is the number of sample observations with measure- 


ment i, and n the total sample number. m and s? are the unbiased sample estimates of ji 
and g?, 


© in; 0 ni(i =) 
2 ao nd 120 n=l 
respectively. This is called, for convenience, Method A. 
Wand k are consistent estimates of w 


E 


- When 7 is large the effect of bias will be slight and 
samples for which equations (5) eannot be solved will be very improbable. 


3. EFFICIENOY or METHOD A 


By the use of differentials the asymptotic variances and covariances of the estimates can 
be obtained, in the usual wa 


y, in terms of the variances and covariances of the first two 
sample moments and the proportion of observations in the first class, which are 


nV (m) = fe, nV(s?) Fri U 74, nV (n/n) = P(1— P), 


cov (m, 52) = ws, n co (m, n/n) = P(A —/4); 
cov (m, nn) = P(1— p, pi — 21). 


| 
| 
| 
| 
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This leads after reduction to 
s mw? 
V(w) = nl - Pj (gk +P) et 1) 9+ Pl—4— 3k + 29 + tyk gi?) P3 E)), 
k+1 


V(k) = FFC 3E + dk + Skt + yk} + PA k + 3y 391], 
= kl 
eov (il) = sis Plat- -a. 2n + 40 gl) - PE Y). 


(6) 


The asymptotic efficiency of this method of fitting will show how much information is 
lost when it is applied, in place of the maximum likelihood procedure, with large samples. 
The determinant of the variance-covariance matrix of the estimates is 


__ (e+ 1) wa m 
n*(1 — P)? (gk +P)’ 
where 
G = 2yk? + kP(2 +0 gk + 4n? + 5% + yk) + PA — 2k + 6y + 79k + yk? — E- . 


The corresponding determinant of the maximum likelihood estimates is 


ee MAU ae (8) 
n*(gk +P)? ((1—P) .L —5P(1 -+n w[5)*] 


Sr -l 
where L= Dr Kar- hl: 
and the efficiency of the method is 
E ky - P}? (9) 


= *1G[-P)L-7PQ *-Inw[y*]' 


Table 1 gives this efficiency for various values of k and the mean M of the complete 

negative binomial distribution 
M = kyl. 

For fixed k, when M tends to zero, the distribution becomes concentrated in the first 
class and the efficiency tends to 100 %. For fixed M, when k tends to zero, the efficiency 
tends to zero, but so slowly that no guidance is given to the levels, for values of k which 
would be met in practice. 

For fixed M when k — oo, P > Py = 
Poisson form. Then 


M/ (er — I) and the distribution tends to the truncated 


tiara 


1 — 
Be- E I-P, 2 40—Py 


For fixed k when M >œ 


=1))k! 
y^ uke DE AED 


r=2 
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These limits give useful guidance to the efficiences for higher values of M and k and are 
shown in the last row and column of the table. 


Table 1. Percentage efficiency of estimation by Method A (modified moment) 


k 0-5 1 2 3 10 co 

M | 

0-5 93-4 97-1 98-9 99-3 99-3 98-6 
1 88-0 93-6 97-1 98-1 98-0 97-4 
2 81-0 88-2 93.7 95-7 97.4 96-1 
5 70-9 78-8 86-5 90-4 97-2 98-5 
10 63-6 70-9 78.9 | 83-7 95-1 100-0 
oo 22-7 38-8 57-5 | 67-6 88-0 100-0 


Over a considerable region of values of the parameters the efficiency is high. For fixed 
M, it rises with k to a maximum value only a little below 100 9%; beyond this, any fall is very 
slight. The efficiency decreases towards the lower left-hand corner of the table as 7 becomes 
closer to one and the distribution widely spread. 


Table 2. Percentage ratio of efficiency of estimation by Method A 


to that of the moment method 
k 0:5 1 2 3 4 5 10 o0 

M 

0-5 112-9 107-1 103-3 101-9 101-1 100-6 99-7 98-6 
1 117-9 110-5 105-1 102-8 101-6 100-8 98-8 97-4 
2 122-5 113-9 107-0 104-0 102-3 101-2 98-8 96:1 
5 126-9 117-2 109-2 105-8 104-0 102-8 100-5 98-5 
10 128-3 117-6 108-7 105-1 103-3 102-2 100-6 100-0 
0⁰ 100-0 100-0 100-0 100-0 100-0 100-0 100-0 100-0 


The determinant of the variance-covariance matrix of the moment estimates is 


(b+ D [21 —P)—9(b-+1) P] 2 
n*(nk + Py [1+ P{k+(k+ 1) In w]gy ao 


Division of this by the expression (7) gi i i imati 
gives the ratio of the efficiency of estimation by 
Method A to that of the moment method, “ 


P0 — P)?[2(1 — P)—n(k +1) P] T 
GU + Pk + (k+ 1)Inwjgjje : ay 


% as M goes 
9% when k becomes very small. When k is 
mn of Table 1, since the efficiency of the 
for large k, when the ratio falls below 100% 


large it approaches the values in the last colu 
moment method then tends to 100%. Except 
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by a few per cent, Method A is rather more efficient than fitting by the first two moments. 
When k becomes very small it is substantially better but, in this region, the efficiency of 
both methods is low. 


4. EFFICIENCY OF ESTIMATION OF THE MEAN OF THE COMPLETE DISTRIBUTION 


Often when distributions are fitted, the main interest is not in the overall precision of the 
method, but the efficiency of estimation of some particular parameter, i.e. of some function 
of 7 and k. When the overall efficiency is close to 100 95, that for each function will also be 
near this. When the overall efficiency is low, it does not follow that this is also true for the 
function considered. The efficiencies of estimation, by Method A, of both » and k, however 
become low when 7 tends to one, the decrease being less rapid for the former. 

A parameter which will usually be of particular importance is the mean of the complete 
distribution, M. For example, this will give the mean breakages per susceptible cell, and 
live births per woman exposed to risk, in the problems cited in the introduction. Special 
consideration will be given to this parameter. 

The estimate of M by Method A is obtained from 


ee 12 
M n -= (12) 
gp. P 3 
and v = ae pg | ay -P ib (13) 
The variance of the maximum likelihood estimate is 
keq [L 7 P(1+Inw/9)?] (14) 


w*(nk + PP [(1—P) L—9P(1 -nw[g)'] 
and the efficiency of the estimation of M by Method A is then 


[L -nP In w/9] 
P FVI 7 3p (15) 
(Q—P)L—yP(1 +Inw/7)"] E * (—P)yk* (=) 3 | 
entage in Table 3, for the selected values of M and k. When E 0, 
When £ is greater than 0-5 the 
is within a per cent or two of 


This is shown as a pere 
for a fixed M, the efficiency tends to zero, but very slowly. 
efficiency exceeds 90 % and over a large part of the region it 


100%. 
Table 3. Percentage efficiency of estimation of M by Method A 
k 0-5 1 2 3 4 5 10 oo 
ss) = 
5 . 99-7 99-3 98:6 
0-5 97-0 99-0 99-7 99:8 99-7 
1 95-4 98-2 99-2 99-3 99:2 99-0 975 "m 
2 93-6 97-0 98-2 98-3 98:2 98-0 97- 1105 
5 91-6 95-9 97-7 98-2 98.5 98-7 99-1 ae 
10 91-3 96-4 98-6 99-2 99:5 99-7 99-9 
0 100-0 100-0 100-0 100-0 100-0 100-0 100-0 100-0 
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5. Meruop B. MODIFICATION OF MAXIMUM LIKELIHOOD EQUATIONS 


Although fitting by the first two moments and the proportion in the first class has many 
desirable properties, its efficiency for low values of w is not sufficiently high for it to be 
preferred to the maximum likelihood method, in this region, very often. It appears worth 
while then to examine how the maximum likelihood equations may be modified to simplify. 
their solution on the same principles as used above. 

If w* is eliminated from the maximum likelihood equations by the use of (3) and P 
replaced by its sample value n,/n we obtain 


z= Etma 
On bem? 
m(k +n,/n) E nin a 7. 
Hm —n,/n) ( kim J+, X6 si-n ‘Em =o (16) 


where the bars denote estimates and R is the highest value observed in the sample. I can 


then be found from the second equation and i follows immediately from the first. This will 
be called Method B. 


The equation in I can be solved by iteration in exactly the same way as the corresponding 
equation, obtained when the complete negative binomial distribution is fitted by maximum 
likelihood methods: Fisher (1953). Although this iteration takes a little time, particularly 
when the distribution has a wide spread, it is less laborious than the procedure required for 
the maximum likelihood fitting of the truncated negative binomial. 


6. EFFICIENCY or METHOD B 


The asymptotic variances and covariance of the estimates calculated by Method B are 


8 w? 
10 = (y Py Tal HEN) NIE 2P)(L+)* 


; — Pk?(L +9) + 252EP(L 4-5) In w], 
ny (k) = k+ Pyr Poka +P) + 2P 8) 0 1n wy 
ah "Eq | PY Q4) — (9k P nw}, 
n cov (wk) = (b+ Py Ta Poka +P) + 0P + 97h} 1 +Inw/9} {L+9+Inw} 
VPE 2P) (L +9) e Pn wf — 1 P*(L +) In aul, 


where T = (k-- P) Le 3P( In w]y). (17) 
The determinant of the variance-covariance matrix simplifies to 
wk? P 
mepri) (E+ PO 2) Le Py k+ Pl) In 2% 
eme intu] (18) 
and the asymptotic efficiency is d 
(E -- P) LN n why 
B(1—P)L—gPü n %% ay 


where B is written for the terms in square brackets in the expression (18). 
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Table 4. Percentage efficiency of estimation by Method B 


ke 05 i 2 3 i 5 10 ax 

M DY l 

0-5 93-8 96-1 974 97-8 98-0 98-1 984 98-6 

1 89-9 93-3 95-3 96-0 964 | 966 97.0 974 | 
2 85:1 89-5 92-5 93-6 94:2 96 | 954 961 | 

5 78-3 84:5 89-8 92-3 938 947 | 06-6 985 | 
10 73-6 815 | 906 94-6 | 966 | 977 99-4 100-0 

| | 1000 100-0 
f 


This efficiency is shown as a percentage in Table 4 for the selected values of M and k. 
For fixed k the efficiency of Method B tends to 100% both as M approaches zero and 
infinity. When M is fixed and k becomes large the limit is the same as for the Method A 
fitting. When E becomes small the efficiency also becomes small, but again so slowly that 
this gives little guidance for values of. practical importance. 

When kis not too small (say > 2) the efficiency of Method B remains at a reasonable level 
throughout (not less than 90 9% approximately). Only for the higher values of M, however, 
is the efficiency greater than that of the very much simpler, modified moment, Method A. 
At the lower values of k the efficiencies, in Table 4, are greater than those of the two moment 
methods, but not high enough to suggest that this method of fitting would often be preferred 
to the maximum likelihood procedure. 


7. Discussion 


Of the two simplified methods presented A is very much the more valuable. It gives 
estimates of the parameters with very slight labour beyond that necessary for the calcula- 
tion of the first two moments, and with an efficiency which is high in the region which 
covers many of the cases met in practice. Even outside this region the ease of the calculation 
makes it useful for exploratory work and for finding first approximations to the estimates. 
In addition the very important parameter M is estimated with an efficiency only a little 
short of 100% except when k becomes small. The second method is of limited value, but 
when M is high and knot too small (i.e. P is small), it gives estimates which are only slightly 
less efficient than maximum likelihood ones and rather easier to calculate. à 
These conclusions, of course, apply strictly only when the number of observations 
becomes very large. When the object of the fitting isto determine the form ofthe distribution, 
moderately large sample sizes will be required to justify an analysis. Tt seems fair to assume 
that, in such instances, the asymptotic theory will give reasonably good approximations 
to the true sampling variances, covariances and efficiencies. The situation may be very 
different if there are good reasons for assuming that observations will be distributed in the 
truncated negative binomial form, and this is fitted to small samples to obtain estimates 
of the parameters or functions of the parameters. In such circumstances, the bias in the 
estimates by all the methods of fitting mentioned above, both standard and simplified, 
mày be considerable and it appears possible that the asymptotie values of the variances 
5 Biom. 45 
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and covariances will not be very good approximations. It is easy to obtain the leading terms 
in n~ in the biases, but the corrections to the variances are very complicated. This problem 
is not considered in the present paper. 


8. EXAMPLE 


The simplified methods of fitting are illustrated on the data below which were collected by 
the East African Medical Survey in the Kwimba district of Tanganyika territory. The 
observations are of the number of children ever born to a sample of mothers over 40 years 
of age. 


No. of children per mother 


> m=3-9912, s*=5-9734, n,/n=0-1441. 


Method A 


When these sample estimates of the moments and the proportion in the first class are 
substituted in equation (5) we have, 


30.9912 

D= gi (1-0-1441) = 0:572, 

„ 0.872 x 3-9912—0-1441 

sis 10-572 500, 


Approximations to the variances and covariance of these estimates are obtained from 
equation (6) with the estimated w and k in place of the true values, This gives 


V(w) = 0-003104, V(k) = 1.3902, cov (wk) = 0-06472. 


; Method B 
It is convenient to write 
Enna mn cmesim 14.5. B 
^ mmn ( t k | n ( kam ) a J 1) P (20) 
Then ag = 


n m- /n 1 R . R 
dk ¢ IU) Bon nay af tin )-; Seir En. (21) 
for some value of k; a second value of k 
the aid of $', on the assumption that the 
short range. Further improved approxi- 
from the values of ¢ for the preceding two 


(i 
l RON 
vo — 0-00064 ~ 0-00258 
475 + 0-00008 | 
478 0-00000 | — 


The first approximation to k is taken as 5-0 from the estimate by Method A (normally it 
would be necessary to round this estimate to a convenient value). ¢ and g can then be 
calculated quite rapidly. The second approximation to k is 


One linear interpolation between 5-0 and 4-75 gives 4-78, for which ¢ is zero to the accuracy 
which is justified by the number of digits calculated for m and n,/n. Because of the size 
of the sampling errors the retention of further digits would only be useful with a much 
larger number of observations. 

From the value of E, W can be found directly from the first equation in (12) giving 


T= 0-561, k= 4-78. 


By the substitution of these values in equation (17) the estimated variances and co- 
variances are obtained, 


V(u)- 0-003054, V(k) = 1-2520, cov (wk) = 0-06088, 


These estimates, compared with the solutions of the maximum likelihood equations, 
are then: 


Method w k | 
A 0-572 + 0-056 5:00 + 1-18 
B 0-561 + 0-055 4-78+1-12 
Maximum likelihood 0-565 + 0-054 486+ 1-12 


The estimates by the various procedures differ little, and the efficiencies of Methods A 
and B are both about 95 9%. It should be noted that since the values of w and k used in the 
calculations are not the same for each method the efficiencies of estimation of parameters 
are not given directly by these comparisons. 


9. EXTENSIONS OF METHOD FOR SIMPLIFYING ESTIMATING EQUATIONS 


In the preceding investigation the awkward exponential type term in the estimating 

equations was eliminated by the use of the proportion of observations in the class next to 

the point of truncation. This method can be extended to negative binomial distributions 

truncated at any point in the lower or upper tail, and also, in the same conditions, to 
5-2 
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Poisson and binomial (known or unknown index) distributions. For the truncated Poi 
distribution, simplification by this procedure leads to methods of estimation which ha 
been studied by Moore (1952, 1954) and Plackett (1953). The relative advantages of 
simplifications introduced by this technique in particular cases can only be assessed 
comparisons of efficiency and ease of calculation. In general, however, it should be m 
useful than simplified methods of estimation based on the use of moments of a hig 
order such as those discussed by David & Johnson (1952) and Rider (1955). 
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THE INTERPRETATION OF THE EFFECTS OF 
NON-ADDITIVITY IN THE LATIN SQUARE 


Bx D. R. COX 
Birkbeck College, University of London 


1. Introduction. Wilk & Kempthorne (1957) have studied the randomization theory of the 
Latin square, paying particular attention to the effects on the interpretation of the conven- 
tional analysis of variance of the absence of unit-treatment additivity, a topic first discussed 
by Neyman (1935). Wilk & Kempthorne’s paper is a report on part of an extensive investiga- 
tion of the main experimental arrangements, and while the discussion in the present note 
is concerned primarily with the Latin square, the results are in fact of general applicability, 

Some of Wilk & Kempthorne’s conclusions need to be recalled, in particular their results 
for the randomization expectations of M, and A, the mean squares for treatments and for 
residual. Similar results apply to randomization variances and estimated variances of 
treatment contrasts. Three important conclusions are: 

(i) Suppose that there is unit-treatment additivity, i.e. that the observation obtained by 
applying a particular treatment to a particular experimental unit is the sum of a quantity 
depending on the unit plus a constant characteristic of the treatment. Then the usual analysis 
of variance is unbiased, so that in particular E(M)) > E(M,), with equality if and only if all 
treatments are equivalent. For the null case, see Fisher (1951), Welch (1938), Pitman (1938). 

(ii) Suppose that there is not additivity in the sense of (i), i.e. that the treatment effects 
vary from unit to unit. Then usually E(Mj) < E(M,) when the average treatment effects are 
zero, the average treatment effects being calculated over all units used in the experiment, 
or over a finite population of units from which those used are randomly drawn. To put the 
point slightly differently, (I. ) /n is not, for an nxn square, an unbiased estimate of 
the component of variation between treatments, defined in a natural way in terms of the 
average treatment effects just mentioned. ; 

(iii) The statistic (JM, — A) /n is, however, an unbiased estimate of a quantity X, defined 
as a certain combination of the population components of variation for treatments, 
treatments x rows, treatments x columns and treatments x rows x columns. In fact, in 
unpublished work, Wilk has shown that randomization E of mean squares can, 
for many designs, be expressed simply in terms of appropriate T's. RE 

In this 91 ade the practical interpretation of the T's and a sense in which the 
Latin square is always unbiased. 

2. A simple situation. Suppose that we have two finite populations 2, . 
the z's might be the heights, measured without error, ofa group 
of trees at site X and the y’s the heights of a group of the same species at à different site Fx 
Consider the following three questions about the means z , y, of the two finite populations. 
We shall, for convenience, state the questions in the language of significance testing, 
although similar estimation problems could be considered. 

(A) Isa =y 1 z " , 

(B) Do x, y, differ by more than would be expected if the z's and the y’s were random 
samples from the same infinite population? 


.,€y and 


Yi -Yy For example, 
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(C) Do z , y, differ by more than would be expected if the z's and the y’s had been 
formed by selecting a random permutation of the combined set 55 

It is well known that if M and N are large, the answers to B and C are nearly identical, 

It is rather difficult to specify precisely the status of these questions. A is a simple direct 
matter with no probabilistic aspects. Except in unusual circumstances we shall find zy. 
Questions B and C, on the other hand, are entirely hypothetical, in that we are starting 
with two finite populations and no objective sampling or permutation procedure is involved, 
However, if we regard the variation within populations as haphazard, B and C do seem 
useful questions to consider. If, according to B and C, x and y. do not differ significantly 
at an interesting level, the data are in this respect consistent with having been generated by 
a single random process, implying that it may not be profitable to regard the observations 
as suggesting or supporting possible physical explanations of the difference or extensions of 
the difference to further individuals. 

Thus, consider the example above and suppose that site X appears to differ from site Y 
in one particular respect R and is otherwise similar to Y. If z , y do not differ significantly 
according to B and C, it seems unsound to regard the data, considered alone, as supporting 
the idea that R is responsible for a difference in mean heights and as suggesting that future 
similar groups of trees, differing by N, will show a difference in mean height similar to that 
observed. On the other hand, if v and do differ significantly, an essentially non-statistical 
element is involved in inferring that R is a cause of the difference and that similar differences 
will be observed in the future. Yet the inference does have some cogency. The issues involved 
here are general ones arising when probabilistic methods are applied to data which have 
not been obtained by randomization and which do not belong to clearly defined random 
sequences. 

As remarked previously, the answers to B and C are nearly identical when M, N are 
large. The objection to B is that it involves reference to an infinite population which is 


which is mathematically clearly defined and which involv: 
other than those actually obtained. 


3. The one-way set-up. Suppose now that we have K finite populations each of N in- 
dividuals, the members of the ith population being z;,, ., a with mean x, . Wilk & 
Kempthorne have defined components of variation between and within populations as 


elc) = xL Xe, a. p, a 
1 
0 = X(N 1j Ea, (2) 


where v. is the mean of the *; . These are natural 
variation: no sampling is involved. 


Now in the spirit of C of$ 2, let us ask for 
that reduces to oh) 


desoriptive measures of the population 


riation there is between the population means that would 
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Symmetry considerations require the use of a(z) — Do? (x), where D is a constant to be 
determined. If we define o.), .) according to (1) and (2), for every regrouping of the 
z's into k sets of N, and if Ep denotes expectation over all permutations of the {zx;;}, it is 


easy to show that E,(oX(..)) = (/ Bp(03{.)). 


Thus we need D = 1/N in order that our measure shall have zero expectation under per- 
mutation of the (z;;). Hence we set 


Ej) = aba) — (1/N) o% (2), (3) 


and call T. (r) the component of effective variation between populations. Note that E,(z) < 0 
if the population means differ by less than would be expected under random permutation. 

Suppose now that a random sample of size n is drawn without replacement from each of 
the K finite populations and the usual analysis of variance made, giving mean squares 
Mi, M, Let E denote expectation in repeated sampling. Then 


E(M,) = o. (4) 
E(M,) = cùl) (1 — n/N)  naj(z) oN. (5) 
The same formulae apply if observations are taken only from a random sample of k out of 
the K populations. Thus, if we estimate a component of variance between populations by 


the usual infinite model formula (M, A) /n, we obtain an unbiased estimate of the com- 
ponent of effective variation between populations. 


4. The two-way set-up. Suppose now that we have a population set-up with R rows and 
C columns, and that the value in the ith row and jth column is æ; Wilk & Kempthorne 
define population components of variation for rows x columns, for rows and for columns by 


1 
Tho = mpo j H=. H., (6) 
1 
al os ani " 


where a dot denotes an average. To measure the population components of effective varia- 
tion between say rows, it is natural to construct a quantity that is unaffected by constant 
differences between columns and this means taking a combination of 03, Cho. If we further 
require that our measure has expectation zero when the separate columns of {x} are per- 
muted in all possible ways, we are led to the definitions 


Er = oh- (1/0) ohe, 9 
Xo = ob — (1B) cho, nc 
and for completeness we set Yao oho (11) 


Wilk & Kempthorne give results analogous to (4) and (5) to the effect that ifr rows and 
c columns are drawn randomly without replacement and the corresponding Observations 
analysed, unbiased estimates of Ly and Zg may be obtained by using the ordinary infinite 
model component of variance formulae just as if both R and C were infinite. 
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5. The Latin square. Consider now the model for the Latin square. Let the experimental 
units be set out in the R x € array of §4 and suppose that there is a conceptual observation 
7,4 that would be obtained if the kth treatment k = 1, ...,n, were applied to the (i, j)th 


three-dimensional set is more difficult and requires less direct arguments. 
First, ordinary components of variation may be defined by formulae such as 


1 9 
= NI. - 2. y, (12) 
1 
* (C—1) (n - 1) Tür-. 4. . T2. oj. (13) 


etc. It seems reasonable to require that our measure of effective variation for say rows x 
treatments should be unaffected by arbitrary changes in rows, treatments, rows x columns 
and columns x treatments, i.e. in the effects not involving rows x treatments. We can ensure 
this by considering population residuals eliminating the effects just mentioned, i.e. 


20 (Rt) = r y. — . % L. (14) 


A quadratic form in the quantities (14) with the requisite symmetry is a combination of 
Gia and ch and hence we consider measures of effective variation of the form 0 — Fo}, 
where F is a constant to be determined. If now the expectation is to be zero when the 
u xt) are permuted completely randomly, we find F = 1 KR n) and hence we put 


Em = Oh gos Chen (15) 


do not involve treatments, define residuals by 


2i (0) = (rg =. ) — (vi. (16) 


and hence consider expressions of the form of — Got, — Ho, — Jokos where G, H, J are to 
be determined. If the e elt) j are permuted randomly, the expectation is 


Gs HU YS) 
(zo CR JJ Fe, qn 
and hi i 

ee Wecreduite RG--CH 4 RCJ = 1. (18) 


Now the quantity used by Wilk & Kempthorne has G = IR, H = 1/0, J = — 1(RC) and 
80 satisfies (18), but we need two extra conditions in order to establish their quantity 
uniquely from the present considerations. 


This can be done by requiring that if, say, 
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similar requirement on rows gives H = 1/C. (18) then requires J = — 1/(RC). Thus we 


E g= ot- poh ote el (v) 
with similar definitions for L., Lp. 

Wilk & Kempthorne show that the expectation of (M, — M,)/n is Y, so that the conven: 
tional analysis of the Latin square gives an unbiased estimate of the component of effective 
variation between treatments in the presence of arbitrary treatment-unit interactions. 

6. Discussion. The main issue for discussion concerns whether the interpretation put 
upon X, is of sufficient interest to make the hypothesis X, = 0 of practical scientific import- 
ance comparable to or greater than that of the null hypothesis c, 0. If Y, is considered 
important, there is a sense in which the Latin square is unbiased, and in which the residual 
mean square estimates an appropriate error for treatment contrasts, whether or not there 
is unit-treatment additivity. This is the view put forward here, although there is certainly 
need for further discussion of the reasoning involved. 

"This is, of course, in no way to say that treatment-unit interactions should be disregarded 
in the design and interpretation of experimenta. On the contrary, if substantial variations 
in treatment effect from unit to unit do occur, one's understanding of the experimental 
situation will be very incomplete until the basis of this variation is discovered and any 
extension of the conclusions to a general set of experimental units will be hazardous. The 
mean treatment effect, averaged over all units in the experiment, or over the finite popula- 
tion of units from which they are randomly drawn, may in such cases not be too helpful. 
Particularly if appreciable systematic treatment-unit interactions are suspected, the 
experiment should be set out so these may be detected and explained. 

But suppose that we do decide to look at average treatment effects. The situation is 
quite parallel to the comparison of two finite populations discussed in § 2. For each experi- 
mental unit there is a conceptual true difference between each pair of treatments and the 
hypothesis v, = 0 is analogous to question A in that it is concerned with whether these 
differences all average out to zero over the finite population. The hypothesis Y, = 0 is 
analogous to question C (or B) in that it is concerned with whether the grouping of the 
differences is significant among a set of permutations of the effects. If we consider that C 
(or B) is often the more useful question to consider statistically in § 2, the same will be true 
in the more elaborate situation. The treatment differences averaged over the finite popula- 
tion remains always perfectly definite quantities and in some cases may be the only things 
requiring consideration, as in those rare cases in which the sole units to which it is required 
to apply the conclusions about the treatments form a finite population from which the units 
used are randomly drawn. 
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support from the National Science Foundation is gratefully acknowledged. I wish also to 
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QUANTAL RESPONSES TO MIXTURES OF POISONS UNDER 
CONDITIONS OF SIMPLE SIMILAR ACTION—THE 
ANALYSIS OF UNCONTROLLED DATA 


By J. R. ASHFORD 
Pneumoconiosis Field Research, National Coal Board 


1. INTRODUCTION 


Research in industrial medicine is frequently directed towards the assessment of the hazard 
associated with a particular process or environment. Under certain circumstances this 


various levels of the hazard affect the same physiological system and do not interact. This 
is equivalent to quantal responses to a mixture of poisons under conditions of ‘similar joint 
action without interaction’ or ‘simple similar action’, in the conventional terms of bio- 
logical assay. The characteristic feature of the problem is, however, that the experimental 
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conditions are not subject to control by the investigator and it is thus not possible to make 
any deliberate choice of the levels of dosage. 

A comprehensive review of models for the joint action of poisons has been given by 
Plackett & Hewlett (1952), but these authors consider that, except in special cases, the 
analysis of uncontrolled data is not worthwhile. The purpose of this paper is to consider 
and compare the various alternative methods of estimating the toxicity of individual 
poisons which may be administered singly or jointly to a population of living organisms 
under conditions of simple similar action, and to derive a method of analysis which is 
applicable under the most general experimental conditions. 


2. THE ACTION OF A SINGLE POISON 
The statistical theory relating to quantal responses to single poisons is well established and 
the application of the standard techniques is familiar in many branches of biological 
research. The approach normally adopted consists essentially of the formulation of a hypo- 
thetical model to describe the action of the poison, in terms of the relationship between the 
dose and the probability that an individual organism selected at random from the population 
will manifest the characteristic response when this dose is applied. 

The ‘tolerance’ of a particular organism is defined as that dose which would be just 
sufficient to produce the response. Thus, for any dose exceeding the tolerance the subject 
would respond, whereas for any lesser dose it would not respond. The individual tolerances 
may be expected to show some variation from one organism to another, and it is therefore 
necessary to consider the distribution over the whole population. If the poison is applied 
at dose x the probability of response may be expressed in the form, 


plz) = fo dó, (1) 


where 570) is the frequency function of the distribution of the individual tolerances. In 
practice this distribution may be markedly skew, but it is often possible, by means of a 
transformation—say f(a)—of the dose x, to obtain a tolerance distribution in terms of the 
transformed dose f(x) which is symmetrical in form. It is usual to assume that the para- 
meters contained in f(x) are chosen in such a way that the tolerance distribution is of some 
Specified form say $(0), having zero mean and unit variance and, in general, umi the 
range of values of f(a) from —co to oo. Under these circumstances f(x) is termed an 'equi- 
valent deviation’ and the probability of response at dose x may be expressed in the form 


pie) =| o0. 2) 


Experience has shown that the equivalent deviation may commonly be represented by 
a linear function of the logarithm of the dose, of the form 
f(x) = a+blogx, (3) 
Where the two parameters a and 6 characterize the response of the . y 
Particular poison applied. In general, different poisons would lead to different values o 


E istribution of tol 
Various mathematical forms have been suggested to represent the Fes 8 80 pe 1 

ances. The normal distribution was originally proposed by Fechner ( i v P me 

with psychometric data. The first reference to the use of this distribution in biologi y 
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was by Gaddum (1933) and the assumption of a normal distribution of the logarithm of the 
tolerance dose has since been made for a wide variety of data relating to quantal responses 
to a single poison. The use of the logistic function to represent the distribution of tolerance 
values has also been suggested and applications have been described by Wilson & Wor- 
cester (1943) and Berkson (1944). Other expressions such as the ‘angle’ function (Knudson 
& Curtiss, 1945) P = sin*f and the rectangular function have been considered, but for the 
most part, analyses of quantal responses to a single poison are carried out in the assumption 
of a normal or logistic distribution of tolerances. 

Under the assumption of a normal distribution of tolerance values of the equivalent 
deviation the probability of response at dose z is 


] [o P à 
p(x) = Jom eem . ( 
The corresponding expression* for the logistic distribution is 
P(x) = [1 t exp( —f(2))]-t. (5) 


3. THE ACTION OF MIXTURES OF POISONS 


The assumption of conditions of simple similar action implies that the poisons making up 
the mixtures have a common mechanism of action. Thus, the basic concepts of the equi- 
valent deviation and the distribution of tolerance values of the equivalent deviation both 
hold good. The methods of approach applied to the problem of quantal responses to a single 
poison may therefore be extended to cover the action of mixtures of poisons. 

If the mixture is made up of w different poisons X, Y, applied at dose (c, y, ...,); 
the equivalent deviation must take the form of a function of the combined dose, say 
S(x,y, ...,t). For any particular organism it is assumed that there exists a. tolerance value 
of the equivalent deviation such as that for combinations of doses leading to equal or 
greater values of f(x,y, ...,t) the organism will respond, whereas for combinations leading 
to lesser values it will not respond. There will, in general, be a range of values of the dose 
(r. y, ....t) corresponding to a particular value of the equivalent deviation. 

To satisfy the conditions of simple similar action it is necessary that the equivalent 
deviation should have the following properties: 

(a) The poisons making up the mixture do not interac: 
probability of response (and thus in the equivalent dev: 
dose of any one poison is independent of the dose of any 


We have, from (2) pin y, csl) m Í a 


st. This means that the relative change in the 
iation) associated with a small change in the 
of the other poisons. 


E 
$(0) dd. (6) 


i riori of tele ata E dose may therefore be expressed as a funetion of the equivalent 
eviation. Now, without loss o; nerality, we ma; i iati 
Fe a aes gel y, we may assume that the equivalent deviation may be 
F(x, Y-t) = Fig(x, , 0) J, 
where g is itself a function of the dose (a, y, 50). 


Hence we have ., Ys ,t) = lla, y, ...,t)]. 


* This definition corresponds to that given by Be: i i 
definition of the logit suggested by Roy ( 1952). e iwi: iia B 
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Now consider the effect of making small changes (dr, dy, ..., 4r) in the dose. The change in the probability 


E „E ESTE 


pe the expression (fer) must be a function of r only. This relationship must hold for all values of r 
nc 
me HE y, iN om gu) + gly) +... +940, 0 


9,0) = 0, (8) 
and (0,0, ...., r. ..., 0) = a, +b, logr. (9) 


It is, therefore, necessary that f(z, y, ..., t) should contain the 2w parameters a,, b, 

(c) The equivalent deviation must be a monotonie increasing function of any one of the constituent. 
poisons, whatever the values of the doses of the other poisons, That is to say, for any set of values of 
the combined dose an increase in the dose of any one of the constituent poisons must lead to a corre- 
sponding increase in the equivalent deviation and consequently in the probability of response. 

For tolerance distributions such as the normal and logistic, which cover the whole range (— 20, oo), 
the equivalent deviation must increase monotonically from — & to co as the combined dose increases 
from zero for all poisons to co for any one poison. 

(d) If the equivalent deviation contains logarithms then it must remain invariant under a change of 
the base of the logarithms. Thus, if the equivalent deviation contains the parameters a, and 5, [where 
a, and 6, characterize the action of the poison R when applied singly] and the base of logarithms is 
changed, the corresponding parameters a,, b, must also characterize the action of the poison I under the 
new base of logarithms. 

If the logarithms are taken to base h, the equivalent deviation for the poison I when applied singly 


Cups Air) = ay +B, logy. 
If the base of logarithms is changed to k we have 
fir) = a, + (by log, k) logar 
Hence, under a change of base of logarithms from h to k 
a,=a, and b= (log,k)b,. 


If the equivalent deviation contains parameters a,, b, (r = 2,y,...,) and 6,, Oy, ..., Ôm and the log- 
arithms are taken to base h or base k, 


. (&. Yo ., ti dr, b.; Oy Ons , Om) frs Yo ts dr b log, k); Oj, Oi r, . (10) 


(e) Any parameter contained in the equivalent deviation must possess a range of values leading to 
a real value of this function whatever the combined dose. 


It has been shown that the equivalent deviation for a single poison R may be expressed 
in the form f(r) = a, +b, log,r 
= log; [Artro], 
Taken in conjunction with condition (a) this suggests that the equivalent deviation for 
à mixture of poisons is of the form 


fy s) log, | Se (11) 
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It may be seen by inspection that expression (11) satisfies conditions (a), (b), (c) and (e). 
When, however, the base of logarithms is changed from h to k 


t log, k 

frs n) = loga | Y goto |, 
yag 

condition (d) above is not satisfied and it is necessary to introduce an additional parameter 

Ó and to consider an equivalent deviation of the form 


[ 
fin y, st) = 0log,| X are. (12) 


r-r 


If the base of logarithms is changed from À to k we have 


filets 1. st) = e| 8 bene i 


gag 


where 6’ = ĝ log, k. 

An equivalent deviation of the form (12) thus satisfies all the necessary conditions. 
However, the process by which this expression was derived does not (and cannot reasonably 
be expected to) lead to a unique solution and it may well be that there are other functions 
which satisfy the requirements of the situation. In the absence of any detailed information 
about the exact mode of action of the mixture of poisons on the physiological system of the 
subjects concerned, it is considered that preference should be given to the expression which 
is least complicated mathematically and which involves the introduction of the minimum 
number of parameters, provided the experimental evidence throws no doubt on the validity 
of the assumption. The expression (12) is considered to represent the simplest non-trivial 
mathematical form consistent with the conditions given above and includes only one 
parameter in addition to the 2w parameters required to describe the action of the w poisons 
when applied individually. Furthermore, the examination of a considerable body of data 
confirms that this expression does provide an adequate representation of the action of 
mixtures of poisons under conditions of simple similar action. The assumption of the form 
(12) for the equivalent deviation is thus considered to be justified. Analternative derivation, 
based on certain hypotheses concerning the transfer of the poisons from the site of dosage 
to the site of action, has been given by Plackett & Hewlett (1952). 


On writing t 
a,+b,log,r=1, YXL-L 
r=z 

(12) may be expressed in the form 

L t wl, — L 

CTUM e d 
JG y, t) = +0 log, X exp (3 log, JI 
L 1, — L)? 
=< +Olog,w+0[ . 

N ow if the contribution from each of the poisons is close to the mean the last term is negli- 
gible and we have 


1 
F@y,...,t) = pe log. 0 


14 
ioe "PC tb,log,r)--0 log, w. (13) 
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Thus, if the component doses are such that the terms (a, +b, log, r) are close to the average, 
the equivalent deviation may be expressed as a linear function of the (2w-+ 1) parameters. 
Under these conditions constant values of the equivalent deviation, and thus constant 
values of the probability of response, are given by loci of the form 


t 
X b log,r = constant, (14) 


rer 
t 
ie. II 7^ = constant, (15) 


rx 


In the special case where the response lines for all the constituent poisons when applied 
singly are parallel, (i.e. for b, = b) the expression (14) reduces to the form 


t 
X log, r= constant. 


pog 
Thus, for mixtures made up of poisons which would be suitable for comparative assay, the 
probability of response is approximately constant for combined doses whose components 
are such that the sum of their logarithms is constant. 


* 4. ESTIMATION OF PARAMETERS 


The estimation of the two parameters associated with quantal responses to a single poison 
has been widely discussed and a number of alternative procedures have been proposed. 
The method of maximum likelihood was first applied to quantal response data by Bliss 
(1935) and has since been adopted in many branches of biological assay. This method, which 
has been described in detail by Finney (1952), involves an iterative process of successive 
approximation to the final result and the necessary caleulations are generally rather 
cumbersome. A procedure based on the minimization of the heterogeneity x? has also been 
suggested. For any postulated form of tolerance distribution this involves an iterative 
process similar to the maximum likelihood solution and the method does not appear to 
have been applied in practice on any considerable scale. 

A modified form of the minimum x? procedure, which has been termed the ‘minimum 
logit x?’ method, has been proposed by Berkson (1944). This method is based on the mini- 
mization of an approximate expression for the heterogeneity x? and leads, in association 
with the assumption of a logistic distribution of tolerances, to a direct solution for the two 
parameters. In comparison with the maximum likelihood or minimum X? methods the 
minimum logit y? procedure permits an appreciable reduction in the effort required to 
compute the parameters in any particular case. 

In view of the basic similarity between the action of single poisons and that of mixtures 
of poisons under conditions of simple similar action, consideration has been given to the 
application of both maximum likelihood and minimum logit x? procedures. It is assumed 
that the test subjects are assigned at random to k groups and that the ith group includes 
^; subjects, of which r; manifest the characteristic response when the group is exposed to 
a mixture of w poisons X, Y, ..., T applied at dose (z; Yi r fi. 

(a) Maximum likelihood method 
The probability of r; responses in the ith group is given, by the binomial distribution, as 


Pir) = (:) Hi Of, (16) 
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where the theoretical probability of response 


P= [sya 


and Q 2 1-H. 
Thus, the logarithm of the likelihood of any given set of observations may be written as 
k 
L = constant + Ale. log P; + (n; — rj) log Q,]. (17) 
i= 
The maximum likelihood estimates of the (2w +1) parameters u(= Oora,,b, Ir = x,y, ...,t]) 
contained in f(x, y, ...,t) may be calculated directly or by the solution of (2w + 1) equations 
of the form êL S oed 2 (18) 
ou = PQ; eu| : 


where p, = ren, is the observed proportion responding. 
The asymptotic variances and covariances of the maximum likelihood estimates may be 
evaluated in the usual way and result in 


T LI e a (3P; 2R) a 
bergen [IT -I Kelle * 
The fit of the observations may be tested by calculating the value of the heterogeneity y*. 


(b) Minimum logit y? method 
The heterogeneity of the set of observations given above may be expressed in the form 
" i 
2 St (py 20 
X 2 EO. (p; 1) ( P ) 


Unlike the method of maximum likelihood, which may be applied to any form of tolerance 
distribution, the minimum logit y? procedure depends basically on the assumption of a 
logistic distribution of tolerances. Under these cireumstances we have, from (5), 


" R 11 exp (— f (ti Yis 1%) L-, 
4 exp (—fi) 


T$ >= > = P. o S 21 
Gf, resp fr 7 5 “ 
Now if f, is the logit corresponding to the observed proportion Pi we have, for | p; — P; and 
7 — 5 | small, OP. 
-- G- (22) 
14 P . 


where P; is some value between P; and H. Thus from (21) 
p. = (fi- f) PQ; 


k 
Hence * X pg, iO C- 
Now F. (P) (9,0) (23) 


; k 
and x: = D mpal hs (24) 
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The method proposed by Berkson is based on the minimization of the expression (24), 
which does not involve the theoretical probabilities of response P. This may be carried out 
directly or by the solution of (2+ 1) equations of the form 


N 3 HAI ^ 
àv 7 1E Pa - 10 oi = 9- (25) 


In the case of quantal responses to a single poison the equivalent deviation may be 
expressed as a linear function of the parameters to be estimated and the equations (25) 
reduce to a pair of simultaneous linear equations. When a mixture of poisons is considered, 
however, no direct solution is possible and it is necessary to employ an iterative procedure 
of successive approximation to the final result, as for the maximum likelihood solution. 

The minimum logit y* estimates are Regular Best Asymptotic Normal (Taylor, 1953) 
and the asymptotic variance-covariance matrix is similar to that for the maximum likelihood 
estimates, 

From (19) and (21) we have 


[eov 6) = [S mo A RAT. (26) 


An approximate test for the ‘fit’ of the observed data may be obtained by calculating 
the value of x? at the minimum by means of expression (24). ` 


5. COMPUTING PROCEDURE 


Under conditions of simple similar action the solution of equations (18) or (25) must involve 
a process of iteration. At least five variables (corresponding to a mixture of two poisons) 
Inust be taken into account, and the calculations necessary to obtain estimates of a suffi- 
ciently high degree of accuracy are likely to be too complex for the normal techniques of 
statistical computing. The use of an automatic digital calculator does, however, offer the 
possibility of the appligation of either method of estimation when it is necessary to analyse 
a large number of sets Om A 

Whatever the iterative procedure employed, the necessary computations must take the 
same basic form. For any given set of data the calculation of initial approximations for the 
parameters must be followed by the repeated application of a procedure of successive 
approximation. The choice of a particular process must depend both on the effort required 
to complete each cycle of iteration and also on the number of cycles necessary to achieve 
the desired accuracy. ; ' 

Consideration has been given to three possible methods of iteration —the systematic 
variation of each parameter in turn, a variant of the ‘steepest descent method, and the 
Newton-Raphson method: The systematic variation of the parameters involves the cal- 
culation of the extremum of the likelihood or logit x? function with respect to each of the 
parameters in turn, the values of other parameters being held constant. A 1 
effort is therefore required at each cycle of approximation. The variant of the ‘steepest 
descent’ method is based on the variation of all the parameters simultaneously along the 
normal to the likelihood or logit x? function in the ‘parameter-space the extremum being 
determined by reference to the quadratic approximation passing through three specified 
points on the normal. The choice of these three points depends only in the values of the 


function and its first partial derivatives with respect to the parameters, and the computing 
Biom. 45 
6 
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of each cycle of iteration is considerably less complicated than that required for the syste- 
matic variation of the parameters. A major disadvantage is, however, the comparatively 
slow rate of convergence, which is associated with a tendency for successive approximations 
to oscillate about the final result. The Newton-Raphson method, whichis normally employed 
to obtain the maximum likelihood estimates in the case of quantal responses to a single 
poison, has therefore been preferred. Although this method involves the calculation of the 
second derivatives of the function at each cycle of iteration, the convergence of successive 
estimates to final result is sufficiently rapid to offset this disadvantage. 

The Newton-Raphson process is based on the n-variable form of Taylor's theorem. For 
any given approximation d, the increments E for the next cycle are given by (2w+ 1) 
simultaneous linear equations of the form 


OF (do) N (2°F(d,)) _ 
Qu Eu | QuOv |- 9; (27) 
where F is the likelihood or logit x*-function. Thus, each cycle of iteration involves the 


calculation of (2w + 1) first derivatives and (2w+ 1) (w+ 1) second derivatives of F. 


The values of L/ for the maximum likelihood procedure are given in equation (18). 
That for 02L/dudv is 


S Pd SL ts] 


mil AQ, (uj EC. PQ, (O, E)) d loo 
For the minimum logit x? procedure, we have, from (25) 
Sy „ Fi) (0f) (af 
Sn a s — 4E LIO QS: 2 
Oudv Eur T (fi a =a 6 05 (28) 
Oy) „ afi) fafa) 
and ap : 2 i 
niea 25 Pid 2 6% 5 e» 


In general, the first and second derivatives of the likelihood function are considerably 
more complicated than the corresponding derivatives of the logit y*-function, even if the 
expected values of the second derivatives are substituted for the observed values [as is 
commonly done in the analysis of quantal responses to a single poison]. 

The assumption of a logistic distribution of tolerances (which is implicit in the minimum 


logit * procedure) permits an appreciable simplification in the expressions for the partial 
derivatives of the likelihood function, Under these circumstances 


. (dne) 


df, | (u 
ae = E Ra, (30) 
E EO T m 


Although the expressions (30), 


(31) and (32) associated wit i ikelihood 
madaan i Suid (32) cia with the maximum likeli 


ex than the corresponding expressions (25), (28) and (29) 
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for the minimum logit x? procedure, in that they involve the ‘theoretical’ probabilities of 
response F, rather than the equivalent deviations Ji, the difference is marginal. Thus, for 
practical purposes, the computations necessary to obtain the maximum likelihood and 


minimum logit x? estimates are comparable, and the relative advantage which the latter 
method has in the case of quantal responses to a single poison does not hold for mixtures 
of poisons. 


Both methods of estimation involve the calculation of the first and second partial 
derivatives of the equivalent deviation, which may be obtained as follows: 


t 
From (12) S(x,y, ,) dog] x Martian 
" : a, +b, log r | 
= Mogyein| Sep Gl. 
Thus, for l = 1,2, ...,t 
of — a,+b,logl II : (ne 
ea, [xol log, e ) Lexy Glog, e 
= Mcab: 0g -H, (33) 
Karel. (UE : 
58 mae (er | en UN 


Similarly, the remainder of the first and second. partial differential coefficients may be 
generated by a series of relatively simple arithmetic operations, and the substitution of 
the expected values of the second derivatives for the observed values would lead to a rela- 
tively small reduction in the calculation of the Newton-Raphson equations. The use of 
this approximation tends to decrease the speed of convergence of the iterative process and, 
on balance, it is considered that the observed values should be employed. | ; 

The speed of convergence of the procedure will depend on the choice of initial approxima- 
tion a?, b? and 0°, As a general rule it is possible to obtain values for a? and be by carrying out 
an analysis on the marginal distribution of r. In view of the relative simplicity of the 
calculations under the minimum logit y? procedure the use of this method is to be preferred, 
whether or not the final method of estimation is to be maximum likelihood or minimum 
logit A2. , 

The calculation of 9°, given initial approximations for a, and b, may be carried out as 
follows: 


From (25) P Ino 10 0% zin 


at the minimum of the y? function. Summing over all values of u = a, (r = c, y, ...,!) we have 


2 [nsa —hi) 2 60 30. 


r=, 


t (af 
From (33) E xj =1, 
Hen ; n, =f) = 5 
goo Z iati fi) = 9. (35) 
Furt pa 5 + +O log w. 
her, from (13) KF ; (4, b, log r) +0 log w. 


6-2 
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Thus, a first approximation for ) may be obtained by means of the expression 


k RC ^ : 
x LU a X, (ar + bp log n| 
60 = Shee — . (36) 


k 
log ve hipiq; 


This process is based on the initial estimates a? and b? and involves two further stages of 
approximation. The estimate of @ may therefore be subject to considerable error, par- 
ticularly if a number of the (a, 4- b, log r) differ markedly, and it may well be considered 
preferable to base the initial value on previous experience. 


6. CONCLUSIONS 


In theory, the assumed form of tolerance distribution should conform exactly to the 
true relationship between the equivalent deviation and the probability of response. In 
practice, however, this true relationship is not known and the choice of a particular dis- 
tribution must be based on other considerations. Examination of the published work on 
quantal responses to a single poison shows that the method of probit analysis is generally 
preferred. The assumption of a normal distribution of tolerances has received wide support, 
mainly on the grounds that the distribution probably provides a realistic description of 
the fundamental processes which take place when a poison is applied. However, the 
logistic distribution agrees very closely with the normal distribution for response rates of 
between 0-01 and 0-99, which covers the whole of the working range of responses, and it is 
unlikely that sufficient information would ever be available to discriminate between the 
two. Even if the true distribution of tolerances were, in fact, normal, any conclusions based 
on the assumption of a logistic distribution would be unlikely to be seriously affected. In 
the absence of any strong theoretical evidence it is therefore considered that the logistic 
distribution is to be preferred, in view of the fact that the calculations involved in the 
estimation of the parameters are less complicated than those associated with the normal 
distribution. Under this assumption both maximum likelihood and minimum logit y? 
estimation may be employed. 

The relative merits of the two methods of estimation have formed the subject of con- 
siderable controversy. The minimum logit X? estimates belong to the class of Regular Best 
Asymptotic Normal estimates, in the sense of Neyman (1949) and thus have the same 
asymptotic properties as the maximum likelihood estimates. In the case of quantal re- 
Sponses to a single poison the behaviour of the two methods when applied to ‘small’ 
samples (of the sizes that are normally employed in biological assay) has been considered. 
Tthas been shown by Miller (1950), Berkson (1952) and Armitage & Allen (1950) that corre- 
sponding estimates agree extremely closely for a wide variety of experimental data, provided 
the true distribution of tolerances does not show any marked departure from the logistic 
distribution. Where there are any appreciable differences between the results obtained 
these may be attributed to the procedures adopted for handling response rates of zero or 
100%, for which the minimum logit y? estimates become indeterminate. 

D oubts have been expressed (Silverstone, 1956) about the validity of the indiscriminate 
application of the 1/(2n)-rule proposed by Berkson (1955) to overcome this difficulty. It 
has been shown that the estimates obtained by the application of this rule may not be 
statistically sufficient and that the justification of this property given by Berkson is open 
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to question. A further objection is the tendency for the value of x? to become unstable 
when the numbers of individuals at any particular dose is small. The difficulty of handling 
response rates of zero and 100% may be partially overcome by exercising reasonable 
discretion in the use of the 1/(2n)-rule, but the instability of the values of X° is a more serious 
disadvantage. In circumstances where the experimental conditions are not subject to 
control by the investigator small numbers of individuals may occur at any of the levels of 
dosage. The difficulty may be offset by taking together the groups with similar levels of 
dosage, but this process must inevitably lead to some loss of information and the need to 
pool groups which contain a small number of subjects is thus a definite objection. On general 
grounds the minimum logit x? procedure has no intrinsic advantages over the maximum 
likelihood procedure as a method of estimation and, as the calculations involved under the 
assumption of a logistic distribution of tolerances are comparable, it is considered that the 
method of maximum likelihood is to be preferred. 

The reasons for selecting the logistic function to represent the tolerance distribution 
and the maximum likelihood method of estimation apply whatever the assumed form of the 
equivalent deviation, provided thisis not a linear function of the parameters to be estimated. 
Thus, the advantages of the method of approach hold good under general conditions of 
similar joint action, whether or not the poisons interact. Indeed, the use of an automatic 
digital calculator offers the possibility of analysing data of this type, even though the 
model for the equivalent deviation may be extremely complicated. 


7. EXAMPLE 


The application of the procedures described above may be illustrated by an example from 
the field of research into the causes of pneumoconiosis amongst coal miners. Experience 
has shown that there are, in general, three distinct levels of dust concentration, differing 
both in average value and variability. They are associated with work on the coal-face, on the 
coal-getting and preparation shifts, and elsewhere underground, respectively. The chance 
of having pneumoconiosis is a function of the periods spent in these three types of environ- 
ment. Thus, if ‘response’ is regarded as having pneumoconiosis and ‘dose’ is measured by 
the periods spent on the coal-face, on the coal-getting and preparation shifts, and elsewhere 
underground, the situation is equivalent to quantal responses to a mixture of three poisons 
under conditions of simple similar action. 

The data obtained at one particular colliery are illustrated in Table 1, which shows that 
the majority of the men had spent an appreciable period of time in more than one class of 
environment. To reduce the numbers of groups under consideration the results for men with 
a similar level of exposure have been pooled and the values for the periods spent in the 
three types of environment are weighted averages. À pe ; 

The initial estimates of a, and b, were obtained by carrying out a minimum logit x? 
analysis for a only, for the groups which cover men with a comparatively short period of 
exposure in the other environments. Similar analyses were performed to obtain the initial 
estimates of a y; by, a, and b,. The calculated values are 

4 =-69, 4% 2 —60, al=—52, 
b= 52, W= 40, B= 29. 


The estimate of 0 obtained by means of equation (36) is 
9 = 7°5. 


Table 1. Exposure to dust and incidence of pneumoconiosis for groups of mine workers 


Quantal responses to mixtures of Parsons 


| Period spent (years) 
Coal-face, | Coal-face, | Elsewhere 
coal- | pre- under- 
getting paration ground 
(z) (y) (2) 
0-25 0-25 0-25 
0-25 0-25 1-74 
0-25 0-25 5:15 
0-25 0-25 10-65 
0-25 0-25 24:31 
0-25 0-25 3744 
1-29 0-25 
1:37 3-60 
0-25 0-25 
419 0-25 
0-25 2-04 
0:25 6:11 
0:25 44-25 
0-25 32-62 
1-22 23-00 
0:25 20-71 
4-06 1-25 
7-60 2-73 
0-25 13-94 
17-14 1-43 
4:50 19-00 
9-62 23-25 
16-86 14-71 
4-12 5:38 
8-70 0:25 
0-25 4:53 
0-25 37-91 
0-25 13-21 
0-25 22-75 
0:25 0:25 
26:50 10-67 
31:20 0-80 
0-25 9-91 
0:25 26:75 
0-25 0-25 
0:25 21.23 
15-25 0-25 
0-25 35-44 
0-25 3-33 
0-25 15-67 
7-26 3:44 
3-40 0-25 
2-94 9-29 
8-50 17-88 
3-18 24-09 
16-33 2-00 
7-92 0:25 
2-55 2-50 
16-60 10-20 


Period spent (years) 


| 


Coal-face, | Coal-face, | Elsewhere | 
coal- pre- under- 
getting paration ground | 
(*) (y) (2) 
1000 | 025 6-00 
1045 | 0-25 32-55 
10-56 0-25 2-62 
10-60 0-25 20-70 
10-86 0:25 11:36 
| 
11-04 0-25 0-25 
13-67 15-67 0-25 
14-00 2-75 775 
14-00 6-90 7:40 
14-00 7-84 0-25 
14-25 0-25 27-50 
14:33 0-25 0-25 
14:38 2-62 10-88 
14-43 0-25 5:93 
14-62 3-17 0-25 
14-63 0-25 1-68 
15-08 3-17 2-83 
15-20 0-25 16-00 
15:44 14:56 3-67 
18:43 0-25 2-93 
18-60 0-25 0-25 
18-64 0-25 9:45 
18-75 0-25 20-83 
21.82 0:25 6-64 
22-36 0-25 1-91 
22-48 0-25 0-25 
22-50 0:25 18-17 
23-41 2-88 0:25 
23:56 9-89 0-25 
23-80 2-40 16-00 
24-00 2-80 2-80 
24.17 12.17 1-67 
2517 12-00 8:50 
26-08 0:25 1:07 
26-36 0-25 5.73 
26:45 0:25 0.25 
26-50 0-25 12-33 
30-63 0:25 0-25 
30-64 0-25 2-36 
30-67 0-25 683 | 
33-83 4-67 7-07 
34-86 0:25 9-57 
36:27 0-25 1:91 
35-47 0-25 0-25 
9-67 0-25 
457 3:14 | 
0-25 8-38 
0-25 2-88 
0-25 0-25 
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Starting from these initial values, the Newton-Raphson procedure was applied to obtain 
the maximum likelihood estimates based on the logistic distribution of tolerances, by means 
of expressions (27), (30) and (31), and the Variance covariance matrix was determined by 
means of expressions (19) and (32). The iterative process was discontinued when the first- 
order partial derivatives of the likelihood function with respect to each of the parameters 
was zero to five decimal places. The pattern of the computations was typical of the majority 
of analyses of this type, in that the initial and final cycles of iteration produced relatively 
large changes in the parameter values, whereas the intermediate cycles had a much smaller 
effect. The convergence of the final iterations was partieularly rapid. The final estimates 
of the parameters were as follows: 


d = —677, @,=—9-66, Â, 455, 
6,= 509, p= 614, b= 230, 
0 = 3°35. 


It will be seen that the initial approximations were most accurate for a, and bi, but that the 
final value for 0 differed considerably from the initial estimate. 
The variances of the estimates are 
var d. = 0-679, vara, = 14-335, vara, = 0-229, 
var b, = 0:346, var h = 17-999, var h. = 0:127, 
var h = 0-627. 
The covariances of corresponding values of @,, 6, are 
cov (dn, b) = —0-480, cov(a,,b,) = —10-537, cov (d,, b,) = — 0-152. 
It thus appears that the estimates relating to y (the period spent on the coal-face, pre- 


paration shifts) are known with the least precision. On general grounds this is not un- 
expected, in view of the variability of the dust concentrations associated with this environ- 


ment. 

The hazard relating to a particular environment may conveniently be expressed in terms 
of the ED 50—the ‘dose’ at which 50% of the population may be expected to manifest the 
characteristic response if the men concerned were exposed only to the given environment. 
If this quantity is denoted by 750 and E, = log ro, we have, from (3) and (5) 

Ry = log ĉso = ,b. 


The 95 % fiducial limits for the ED 50's are 


^ A 21 24 years, 
Xo = 1333+005, Le. 0 i years, 
1 79 years, 
Pug = 1574032, ie the = 97 fis years, 
E 193 years, 
and 200 1.984 0·31, ie. 250 = 95 47 years. 


The value of y? relating to the final estimates of the parameters is doa which corre- 
sponds to a significance level of 7%. Only 9 of the 98 components of x? relating to the 
individual groups are in excess of 3:0. There is thus no evidence that the observed data 


deviate significantly from the hypothetical model on which the analysis is based. 


This paper is published by permission of the National Coal Board. 
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SOME PROPERTIES OF RUNS IN QUALITY 


CONTROL PROCEDURES 


By P. G. MOORE 
University College London 


1. The usual type of control chart for average values is based on the means of successive 
samples of some fixed size, n. The limits commonly placed on the chart correspond to pro- 
babilities such as 0-998 or 0-99 that the sample mean falls inside the limits when the system 
is operating satisfactorily. If the system goes out of control due to the average level altering, 
the two extreme situations that may have occurred are: 

) The average value of the population of items being manufactured may change slightly 
due to a tool wear, or a fresh batch of raw material, or a slight variation in the voltage of 
the power supply and so forth. 

- (ii) A large change in the average value takes place due to something going wildly out of 
control. This happens when mistakes are made in the manufacturing process, such as bolts 
being left undone in some machine or wrong methods being used by one of the operators. 

There are other faults that could occur which would affect the standard deviation of the 
manufactured items but we shall assume here that the standard deviation remains constant 
throughout. 

Usually the economics behind any inspection scheme limits the amount of sampling that 
ean be done and it is necessary to decide whether to take, on the one hand, small samples 
fairly frequently or, on the other hand, large samples rather more infrequently. A large 
sample has more chance of picking out a change of type (i) whereas changes of type (ii) would 
be easily picked out by both large and small samples. As small samples are taken more 
frequently changes of type (ii) would be detected quicker, on average, by such samples. 


2. Since small samples have desirable properties for type (ii) changes it is worth seeing 
whether their performance for type (i) changes cannot be improved. One method of approach 
is to consider the sample means in bunches rather than in isolation. Mosteller (1941) con- 
sidered the number of runs of means that were above or below the median, and Olmstead 
(1946) used the number of runs up and runs down as his basis for a test. Weiler (1953) 
Suggests that we should stop production when a specified number, J, means in succession 
fall over the control limits set up for the scheme. This latter rule is the one that we follow 
here. There are other possible rules based on using two limits, warning and action, in con- 
junction with runs. Dudding & Jennett (1944) gave a brief discussion of the possibilities 
and Page (1955) gave some tables for such schemes. Some comparisons with this type of 
Scheme are made in § 6 below. 

3. If the rule adopted isto stop production when T successive means fall beyonda specified 
Control limit, the appropriate position for this limit (or pair of limits) will vary with T. 
There are several ways in which the effects of schemes with different values of T can be 
equated in order that the schemes can be regarded as equivalent when the population 
average is remaining constant. The method adopted here is to make the average number of 
samples drawn before a stoppage occurs the same whatever the value of T' that is being used. 
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If p is the probability that one single mean value falls beyond the control limit, then the 
average number of means that must be observed before a run of T' successive means fall 


outside the limit is (1 —27)/1 — p) p7. (1) 


This result is derived, for example, in Feller (1950, p. 266). If T is equal to unity, (1) reduces 
to 1/p. Having chosen this value of p in the customary control chart manner, we now choose 
for each value of 7' a value, py, such that 


0 — »2)/0 — pz) pf = Vp. (2) 

Let the mean and standard deviation of the controlled population be g and o, respectively, 
Then if the upper control limit for the mean of a sample of nis put at £ + Ac] Vn, where À is 
chosen to make the probability of falling beyond the limit equal to p, we would have to find 
a new value Ay such that p, is the probability of one mean falling beyond £ T An. 
Equation (2) can be solved to give the values of p, that correspond to p for various values 
of T and hence Ais found. In the next two sections we consider in particular 7’ equal to two, 
three and four. Longer sequences than four would not in general be acceptable, because 
such sequences impose quite a delay before a change in mean can be detected. The values 


of pp and Àp obtained by this method for the two sets of limits considered in the next section 
are: 


Average number of samples examined before production stopped unnecessarily, 
ie. when no change in population mean 


1000 200 
| 
T 1 2 3 4 1 2 | 3 4 
Pr 0-001 0-0321 0-1037 0-1873 0-005 0-0732 0:1825 0-2891 


Ar 3-090 1-850 1:261 0-888 2-576 1-452 | 0-906 0-556 


changed mean, at least one run of P 
approach gives an overall figure, whilst the second, considered as a function of n, gives à 
more complete picture as to the 
drawn. 

For (a) a revised value of pp can be calculated after 


mean value, and equation (1) can then be used to find t. 
means fall beyond the control limit. 


any given shift has occurred in the 
he average run before 7’ successive 
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and 20, is fairly tedious and for large m it is essential to seek some other method. Feller 
(1950, chapter 13) develops a method for obtaining the probability that in m trials the first 
run of length 7 occurs at the mth trial. He gives this probability, f, as 


MIT AE 
Sm (T-1- Tz) (1p) i (3) 
where / is the probability of one individual being beyond the control limit and z is the unique 
f 
peo (1— p) s(1 4- ps 4- ... - pT-1gT21) = 1. (4) 
Hence, our required probability of no runs of length T is the sum of (3) for m 2 m-- and 
this gives the expression , where 
(=æ) d 

In~ (7-1 — pee) e 
Some experiments were made to test the accuracy of this approximation for m = 10 and 
for runs of two, three and four. As Feller himself has demonstrated the accuracy when 4 = 3, 
attention was concentrated on the cases when / 4. The exact values were obtained by 
enumeration of the possible runs and the approximate results obtained from (5). The latter 
were very good and quite adequate for the purpose in hand. Some specimen figures obtained 

are given below and show only minor discrepancies in the fourth decimal place. 


T=2 Ts Tz 


# 


Exact 9% Approx. qm Approx. dm 


0-4 0:7103 0-7100 0:1167 
0:3 0-5036 0-5036 0-0420 
0-2 0-2733 0-2734 0-0093 
0:1 0-0803 0-0803 0-0006 


5. In order to compare different sizes of sample, different lengths of run and different 
changes in the mean simultaneously it is simplest to keep one of the possible variables fixed. 
In this case it was decided to relate the change in the mean to the size of the sample, n, that 
is being used and consider changes in the mean of three magnitudes, namely 

0·50/ n, 1:00 / In, 1.50 /n. 
By doing this the results become, for the purposes of these illustrations, independent of the 
size of sample as, whatever the shift, the probability of falling beyond the limits specified 
is the same for all sizes of sample. 

Two significance limits vas utilized, described as inner and outer. Limite STAY Vn 
were chosen as described in $3, such that if there was no shift in population mean the 
average number of samples of n which must be observed before a run of 7’ means exceeded 
the corresponding limit was (a) 200 for the inner limit, (b) 1000 for the outer limit. Figs. 1-6 
show the probabilities that after a specified number of samples, m, have been drawn there 
will have been at least one stoppage due to 7’ successive means falling beyond the level 
Prescribed. The figures show that there is always some gain in using a higher value of 7, 
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although the actual gain obtained when increasing 7' by one decreases as 7’ gets larger, 
As a rough and ready guide it appears that 7' = 2 is the most suitable value to use. It has 
great advantages over T = 1 and avoids the inevitable minimum delay in picking out a 
large change which must occur when a decision has to await at least three (or four) sample 
results. 
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Fig. 1. Use of inner limit with shift of 0-50/,/n. 
Fig. 3. Use of inner limit with shift of M. 
Fig. 5. Use of inner limit with shift of L-5o/J/n. 


Fig. 2. Use of outer limit with shift of 0˙50/ Vn. 
Fig. 4. Use of outer limit with shift of / Vn. 
Fig. 6. Use of outer limit with shift of 1˙50/ n. 
i Table 1 gives comparative results in terms of average number for each of the schemes 
illustrated. For very large shifts it will be noticed that the larger values of 7' seem unsuit- 
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Scheme I. Samples of size 5 are used. 
(a) Stop the production if 3 consecutive sample means fall between the warning and the 
action limits or if any one mean falls beyond the action limits: 
warning limits + 1250/5, 
action limits y + 3-000/,/5, 
where /i and g are the mean and standard deviation of the controlled scheme. 
(b) Alternatively, we stop the machine if a run of 7’ means fall beyond some limit, the 


limit depending on 7' and being chosen in such a way that the average run length for the 
system when in control is always the same as in (a). 


Table 1. Average number of samples of n observed before stopping 


Shift in mean | Limit used | Tl T 


mí T=3 

0-50/ n Inner 52-8 40-2 36-4 
Outer 208-5 139-1 114-3 

1-07 n Inner 17-4 125 11-8 
Outer 54-6 30-7 24-8 

1.50% m Inner 7.1 5˙6 5-9 
Outer 17-9 10-3 9-3 


Table 2. Average run lengths for Scheme I 


Value of k 
Scheme 

0 0-2 0-4 0-6 
43 

(a) (Page) 503 285 102 
T=1 A,- 2:58 503 281 108 46 
b T=2 A= kag 503 226 78 35 
(0) Tz8 AOI 503 207 72 35 
T=4 A,=0-56 503 199 72 38 


Table 2 gives the A.R.L.'s before a stoppage occurs when the mean of the population is 
changed by an amount ko. It is assumed that c remains unaltered. The values for (a) are 
taken from Page (1955). The averages are not always multiples of the sample size, 5, but 
this is only a reflexion of the fact that they are just averages and the average value itself 
cannot always be achieved. From Table 2 it seems that scheme (a) can always be bettered 
by one of the (b) type schemes, but unless the value of k were known it would be impossible 


to say which value of 7' should be used. 
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Scheme II. Samples of size 5 are again used. 
(a) Stop production if 4 consecutive sample means fall between the warning and the 
action limits or if any one mean falls outside the action limits: 
warning limits x + 1-000/,/5, 
action limits x +3-000/,/5. 
(5) is the same as for Scheme I, adjusted so that the average run length when the system 
is in control is the same as in (a). A comparison is shown in Table 3. The same characteristics 


are apparent as for Scheme I and it is impossible to nominate one scheme of type (b) that is 
going to be always the best. 


Table 3. Average run lengths for Scheme II 


| Value of k 
Nehe. ne | ——— — 
0 |o | 04 0-6 
* Put | j 
(a) (Page) 527 305 112 | 47 
T=1 A, = 2-59 527 293 112 47 
(b) T=2 A,=1-47 527 235 80 36 
T=3 A, =0-92 | 527 214 73 36 
T=4 A,=0-57 | 527 206 73 | 38 
Table 4. Values of Àr in two-sided control limits 
| | 
| 50 | 25 
| 
| 
2-326 2-054 
1:253 1:029 
0-724 0-515 
0-383 0-179 


The results for these two schemes are in agreement with the results found by Weiler 
( 1953) and it can be seen from his paper that in many cases there is an optimum value of T 
if the average run length is to be kept to a minimum. 


y^ In summary it seems that the use of a system of runs can effect considerable improve- 
ments in the properties of quality control schemes. Although the schemes discussed here 


rule as to the most suitable value of T to use in all circumstances, but it is apparent that 
increasing T from one to two will in general effect an improvement. Increasing 7’ further 
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SIMULTANEOUS REGRESSION EQUATIONS IN 
EXPERIMENTATION 


By E. J. WILLIAMS* 
Institute of Statistics, North Carolina State College 


l. INTRODUCTION 


The purpose of this paper is to discuss the determination and interpretation of simultaneous 
equations fitted to experimental data. Although little has been written on simultaneous 
equations in experimentation, their uses in economies have frequently been discussed. In 
that field, however, there is often no distinction between dependent and independent 
variables. In what is known in econometrics as a complete system of simultaneous equations, 
there are as many equations as endogenous variables, so that the equations consist of a 
linear transformation from the unknown disturbances and known exogenous variables to 
the observed variables. The treatment of simultaneous equations in econometrics is 
generally troublesome and depends on the completeness of the system of equations, and the 
identifiability of the parameters. l 

In experimental work, on the other hand, there is in many situations a clear distinction 
between the dependent and independent variables. Thus, the number of equations will be 
at most equal to the number of dependent variables. In this field, too, there is a case of 
particular interest, as will be shown below, which occurs when the numbers of dependent 
and independent variables are the same. The applications of simultaneous equations to 
experimental work seem to be quite important and are much more straightforward than 
those in econometrics, yet, strangely enough, they seem to have been little discussed. The 
only published work in this field with which we are familiar is that of Box & Hunter (1954), 
but even this relates to a different situation from that considered here, and to particular 
applications in experimental design. . 

We begin by discussing à simple application of simultaneous equations to experimental 
work. Then will follow the mathematical theory, after which special cases will be discussed. 
It seems that the relatively early introduction of the linear discriminant function has 
diverted the attention of statisticians from simultaneous equations; it appears that in 
many cases which are dealt with by discriminant fı unctions, the set of simultaneous equations 
(from which the discriminant function can be derived) is more informative. 


) discuss the quantitative determination of glucose and 
galactose simultaneously in solutions of unknown chemical 
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studied, optical density for each sugar is proportional to amount of sugar; then use is made 
of the fact that each sugar differs in its density to light of different wavelengths. 

Solutions containing known amounts of glucose and galactose were prepared, and the 
density at two different wavelengths (470 and 560mj) determined. The data enable a 
regression of density on amount of each sugar to be determined for each wavelength. These 
regressions then constitute a calibration of the apparatus, such that if optical densities 
for some unknown solution are substituted in the equations, the amount of each sugar can 
be estimated. 

Thus, if y, and y, are the optical densities at 470 and 560 my, respectively, and x, and 
a the amounts of each sugar (in milligrammes), the regression equations may be written 

T, = buti tbate Y,-dgzkbz,. (1) 
These equations have no constant term, since the optical densities are zero at zero con- 
centration of the sugars. In the practical use of these equations, the y's will be observed 
values and the x’s predicted. If the equations are solved for this purpose, we get 
X, bun bee, Xy = Vy, +b yy (2) 
where the matrix a =| 
512 pm 
is the inverse of the original matrix of regression coefficients, 
pa bel 
bus bal 

The equations (2) will be called inverse regression equations, and the X-values inverse 
estimates. It will be seen that in practically every calibration problem, inverse estimates 
are required, since the quantities arbitrarily assigned in the calibration are unknown in 
the application to estimation. 

Problems of this kind must be of frequent occurrence in quantitative chemical analysis 
and in other fields. The determination of the accuracy with which estimates can be made 
from such equations is an important practical problem. We now give the mathematical 
derivation of sampling errors and fiducial intervals, before returning to the arithmetical 
analysis of the example just discussed. 


3. SIMULTANEOUS EQUATIONS IN GENERAL 
In general, we may consider that we have n observations on each of p independent variables 
?; (i = 1,2,..., p) and q dependent variables y; (j = 1, 2, ..., ), and that we require to esti- 
mate the y, in terms of the x; or vice versa. Then we may determine q regression equations 
Y = Zbyt G=. 25-5259) (3) 


in which for simplicity the variables are measured from their means so that the constant 
terms vanish. 
We adopt the following notation: 
tni sum of products of x, and x; (n — degrees of freedom), 
uj, total sum of products of y; and y; (n— degrees of freedom), 
vj, residual sum of products of y; and y, (n— p — degrees of freedom), 
T = (thi), T+ = (t), 
U = (wu), 
V = (vj), V = ent). 


Biom. 45 
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Lower case x; or y; will denote either observed or potentially observed (though sometimes 
actually unknown) quantities, while capital X; or Y, will denote estimates based on the 
observed quantities. 


4. DIRECT ESTIMATION 


If one of the y; or a linear combination of them is to be estimated from the equations (3), 
the procedure is straightforward. For the variances of the regression coefficients we have the 


familiar results (n—p— 1) V%) = vyt" 


and generally (n — p — 1) cov %, be) = vj, t^ 
= (n—p-—1)cov (bij 5570. (4) 
Hence, for the variance of an estimate, we have 
(n—p—1)V(Y) = vtm) X, iUm (5) 


and in general for the covariance of any two estimates, 
. (n=p- 1) cov (Y, Y,) = vs((1/n) + > D Da, (6). 


the term 1/n being included to allow for the fact that the variables are measured from their 
means. 

Tn order to know how much a new observation y; will vary about the predicted value, 
we need the variance about an estimate, as well as the variance of the estimate. We have 


(n—p-1)V(y, -Y) = Hv; (7) 

and generally, (n — p — 1) cov (yj, -Y;, yy Ii) = Hog, (8) 

where E — V (1n) - X X fix, e. (9) 
7 7 


If it is required to estimate a linear combination of the y;, for example 

Ya = Xy, 

the regression coefficients are linear combinations of the original coefficients, viz. 
bia = Narbe, 

and the regression equation for estimating y, may be written 
Y, 4a bia Ti. 

t . i 
The variance of an estimate is given by 
(n—p—1) V(¥,) = Y Daya vg ((1n) + Y N. (10) 
Dur 


A special case of a linear combination of the dependent variables is Hotelling's *most 
predictable criterion’. For a linear combination with coefficients a;, the residual sum of 
Squares after fitting the regression on the x, is 

5,9, 11 
Y afaren (11) 


and the total sum of squares is D La, ar u (12) 
7 7 
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The linear combination which minimizes the ratio of (11) to (12) will clearly bean estimate 
of that linear combination which is least affected by departure from regression, and has 
been designated by Hotelling the most predictable criterion. The coefficients a; will be 
found as one of the latent vectors of the matrix V-10 Whether this linear combination has 
any relevance to the interpretation of the data will depend on the nature of the problem. 


5. INVERSE ESTIMATION 


As mentioned earlier, we are most often interested in using a set of simultaneous regression 
equations inversely for estimating values of the independent variables from observed values 
of the dependent variables. This situation arises frequently, for example, in calibration 
experiments, as the above discussed example shows. Now in order that the regression 
equations may be solved for the independent variables, it is necessary, generally speaking, 
that the number of equations equal the number of independent variables. If there are 
fewer equations than independent variables, they cannot be solved, and all that can be 
determined are certain relationships among the estimated values of the independent 
variables. On the other hand, if there are more equations than unknown independent 


variables, we have redundant information; however, by an adaptation of the method of 


least squares, valid estimates of the unknowns may be determined. In this case the dis- 
crepancies of the individual equations from these estimates provide a measure of the 
consistency of the different equations and hence of the different dependent variables. We 
shall consider each of these cases in turn. 


(a) p =q. 
In this case the regression equations (3) may be solved directly to give the estimates of 
the h which we denote, without risk of confusion with direct estimates, by X,. The solutions 


are X, = Dbity,, (13) 
j 


where the bfi are the elements of the matrix inverse to the square matrix 
B = (bj). 

We note that in the matrix B, rows correspond to x-variables and columns to y-variables, 
while in B, rows correspond to y-variables and columns to z-variables. Thus, for either 
direct or inverse regression equations, the regression coefficients corresponding to any 
predictand are read down the columns. ; 

We shall show below how tolerance limits for values, corresponding to the estimates X,, 
may be determined by means of the F-test. First of all, however, it is of interest to deter- 
mine approximate standard errors for these estimates. These standard errors will be 
applicable when the estimated regression coefficients are large compared with their stan- 
dard errors, and the inverse regression coefficients are likewise large compared with their 
Standard errors. This second condition requires in particular that the matrix B be not almost 


singular, 
Now we have BB =I, 
hence, on taking differentials, and multiplying the results by B, we find 
dB = — B-\(dB) B^, (14) 
whence dii! = — bbb db. (15) 
h k 
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The equations (14) and (15) represent a linear transformation of the differentials 
Taking the direct product (van der Waerden, 1931) of such a transformation and 
transposed, we have 
dB xdB’ = (B-(dB) B-!) x (B'(dB') B'*) 
= (B-! x B’-1) (dB xdB') (B-! x B'-1). | 
Now each of the direct products in equation (16) is a p* x p* matrix whose typical elem 
are products of two regression coefficients or differentials. For instance the typical elem 
of dB xdB' is 
db; dbyy. 


If we take expectations of each side of equation (16), we get on the left-hand side 
matrix of variances and covariances of the 5/5, while on the right-hand side the mid 
factor gives the matrix of variances and covariances of the bj. Now, as we have seen, f 
appropriate estimate of the covariance of b; and bp; is 

tit, /n — p — 1). 


If we make a suitable permutation of rows and columns, the expected value of the mid 
factor therefore becomes the direct product 


>l LE — 
of two p x p matrices. Pitre a 


Denoting the estimated expected value of the left-hand side by W, we find, again ai 
suitable permutations of rows and columns, that 


(n—p—1) W = (B>xB'>) (T-1 x V) (B'-1 x B31) 
= (BATAB'3) x (B'3y B3) 


= M3xQ3, 
where M = B'TB, 
and Q = BVAB', 
so that M4 = B47T-1p-1, 
and Q- = pB'Ay pA, 


This result gives in particular 
(n —p — 1) eov (bi, bi") = m, git 
= LLM MIN Y N vg pues 
hh kk 
= (n— p — 1) cov (bi, bi’), 


These results are, of course approximate and will often be inaccurate; their interest li 
in the fact that the expressions found are similar to those occurring in the exact analys 


We may now determine approximate variances and covariances of estimates X, bas 
on observations Yj. 


= (L1/n) X X vy be 
7 w 


( - (Xj) = (n—p—1) V(X ty) 
4 
DEX D yyt bK v v yy bribi 
7 7 7 Ee 
7EZIcQUVMO (4% Y y ow x, x). (J 
hh 
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This result follows from the formula for the approximate variance of a product, and from 
the fact that and the y, are independent. Similarly, to the same degree of approximation, 


(n — p— 1) cov (X, X.) = EX bw * (Mn) + Er'xn). (20) 


The covariance matrix of the X, may be written HQ-*/(n —p — 1), where Q^! = BWV B~, 
as above, and H is here a function of the estimates rather than of observed values as defined 
in (9). It may be noted that these results are analogous to those found in direct estimation 
of an observation y,. There we have 

(n—p— 1) V(y, - Yj) = eH, 
and (n—p— 1)cov (y, - Y, y, — Y,) = e, H, 
where the z, are now observed quantities, the Y, are regression estimates, and the y, are 


new observations, not used in determining the regression. 
The exact determination of sampling variation is not much more complicated. We may 
find simultaneous limits for the unknown quantities z, in the following way. The ratio 


12 J % Zbyt) (n. Tad (21) 
p H 


is distributed as F with p and n— 2p degrees of freedom. By substituting various sets of 
values of the 2, in the formula we can determine for which sets the associated value of F 
is non-significant, and hence which sets are concordant with the data. The range of con- 
cordant sets of the a; defines a fiducial region for the values. 

Now since we may write y= E Lr. 


the y; being observations and the X; estimates, we have 
LIV ber ( > bizt) = E z » X (X, —2,) (X, — a) ys dax 
7 7 7 


= » z (X 2r) (Xi -h (22) 
where ua = Y Xchyba. 
Since q, is a typical element of the matrix 
Q = BV2P', 
(22) may be written (X —2) BFE AN“). 
Hence, the simultaneous fiducial limits for the values x; are given by the solution (if real) of 
Xy 72) (Xi— Ti) dui 
"E A734) (K.. H: (23) 
p 
If limits for a single value x, are required, we have 
V(X,) = Hg" |(n—p-})- (24) 
Now since = BAB, Q?- B'3y B3, 


80 that gi= Eye. 
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Hence, with 1 and n—p—1 degrees of freedom, 


-E 
HY Y v4 ^p 
ik 


since the x, are unknown, the approximate variance estimate based on the X would ne 
to be used to give fiducial limits for a single x}. 


(b) p«q 
In this case we have more equations than unknowns. We have a choice here either¢ 
omitting q— p of the equations (provided we can decide from prior considerations which 
least useful), or of using the additional information given by the equations to test the co 
sistency of the relationships involving the different dependent variables. This latter aspe 
is the one that we shall examine. 
If an observation of a set y; (j = 1,2, ..., q) of dependent variables is to be used to estima 
a set x, (i = 1,2,..., p), we may so determine the estimate that it has minimum (estimate 
variance. Now since the estimated covariance of Y; and y, is proportional to Virx the quay 
tity to be minimized, with respect to the n, is 


L = vik(y, - D big) (Yk— X biz). 


If we put, as in (a) Q= BV—B’, 
so that Ini = EX. 
and also put P-ByAy (2 
so that P. 2 vb y, 
we find for the normal equations QX=P, Y (27) 
ie. È Mni Xa Pt, | 
80 that X-2QAp, (28) 
or X,- 2 p. 


These results are similar to those found for the case p = q, except that here the matrix B 


does not possess an inverse, so that the estimates need to be expressed in terms of he 
matrices P and Q. 


As in the case p — 4; the estimated covariance matrix of the X, is 


criterion is 
(n—p—q) Y D 55% — X bis X) (y, — L ba X) 
Ta. H 
which is distributed as F with q—p and n—p—gq degrees of freedom. 


- 
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"COH un- PN (29) 


j k 


If the value of F is not significant, there is no evidence for regarding the equations as in- 
consistent, and fiducial limits may be determined for the z,. For these we have 


-E Y Ya(X n) oz) (0) 
with p and n — p degrees of freedom. This may be written in the alternative form 
F- — E (Pr Xzq4(py— Xd. (307) 
p ah D 7 


By means of this criterion, the concordance of any set of x, with the data may be established. 
In the particular case when p = 1, the solution of the equations gives the discriminant 

function for assigning a value of z, on the basis of observations of the q variables y,, ys, ..., Yg. 
The discriminant function is 


XI = I 
2 vu tbr b. (31) 
To test the consistency of any set of observations y;, the criterion is 
8 P Sg Pian (02) 
74-1 «Inch 
with q— 1 and n—q—1 degrees of freedom. 

It should be remarked here that this is a test, not of the discriminant function, which 
has been established from previous data, but of the consistency of the present set of obser- 
vations. A significant result may indicate either that the values of y, are not consistent 
among themselves, or that the discriminant function determined from previous data does 
not apply to the present observations. 

(c) p>q 

In this case we have fewer equations than unknowns, so that estimates of the unknown 
x; cannot be determined. The most that can be done is to find a relationship among p—4q +1 
of the estimates X.. In many cases such a relationship may be all that is required, as is 
indicated in the example given below. : ] : 

Suppose that we wish A eliminate XI, XA. ., XA, and to determine the relationship 
among X., X, 4, ..., Xp. The determinant of the first q— 1 rows of B and the q—1 columns 
resulting from omitting column j will be denoted by (— 1)! Bj. Then it is readily shown 
that the required relationship is 


q 
X B, S bij x. X By (39) 
j=1 i=q yuk 


The fiducial limits for the corresponding relationship among the a. can only approximately 
be determined. 
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The fact that p» does not, however, prevent simultaneous fiducial limits for the a 
from being found. The criterion, distributed as F with q and n — p —q degrees of freedom, 
from which similtaneous fiducial limits may be derived, is 


2 pos g > `> 5% — » bo) ( — X birt) = 


= um L qui Xp, a) (X. h, (34) 


where // is the typical element of the matrix Q defined above. Here Q, though it is a p x p 
matrix, is of rank q. 


6. DISCUSSION OF THE CHEMICAL EXAMPLE 


The original data of the experiment discussed in $2 are given by Fisher, Hansen & 
Norton (1955) in their Table 1, so are not reproduced here. 


Table 1. Analyses of variance and covariance of optical density measurements at 
470 my (y,) and at 560 my. (Y2) (Fisher, Hansen & Norton’s data) 


a 
Sums of squares and products 
Degrees 
of freedom 
yi Vis yi 

Regression on Vi, Ly 2 2-570253 4-207267 6-995805 
Residual 26 0-003167 0-002996 0-006733 

"Total 28 2-573420 4-210263 7:002538 


Table 2. Matrices of sums of squares and products, and of regression coefficients 


T 10° y B 
00 0:07507 ` 3167 2996 L2166 1:3465 
0:0750 0-2500 2996 6733 2-6240 12751 
TAa ya Ba 
5 PW A - — 
43956 — 1-3187 5453 — 242-6 2.1311 —0-6070 
— 1-3187 4:3956 — 242.6 256-5 [. 1-1829 0: D 


Fisher et al. (1955) fitted quadratic regressi A 
that the quadratic terms were significant only at the 5% 
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Thus we see from Table 2 that the direct regression equations are 


Y, = 1-21662, + 2-602407, Y, = 1-346652, + 472767, (35) 
and the inverse equations are 
X, = 2-1311y, — 1-1829y,, X, = —0-6070y, + 0-5484y, (36) 


in agreement with the results of Fisher et al. (1955). 

The direct equations are less useful than the inverse ones. Since in this example the 
numbers of dependent and independent variables are equal, no test for consistency is 
possible, but we can derive fiducial limits for the values of x, and x, corresponding to 
observed values y, and yp. 


Table 3. Matrix products required in estimating variances 


| 
MAs Big 10% u = B’“" VB-) 


24-995 — 15-032 8699 —2812 
— 15-032 9-183 — 2812 1197 


Since the inverse regression coefficients, as well as the direct coefficients, are likely to be 
well determined, we may calculate their approximate standard errors. Table 3 gives the 
matrices BTB and B’1V B+ required in these calculations. Then, for example, the 
variance of bel is obtained using the second diagonal term of B47-1B’— and the first 
diagonal term of B'y B3: 

9-183 x 10-9 x 8699/26 = 0-003072, 
so that the standard error of b?! is 0-055. The standard errors of the coefficients may be set 


out as follows: 0-091 2 50 
pee 0.021] 


For general purposes, of course, the covariances as well as the variances of the regression 
coefficients will be of interest. H 

In determining the approximate variance of an estimate X, since the regression is 
through the origin rather than the point of means, the actual values of the Vj rather than 
departures from means are used, and the term 1/n is omitted from the variance estimates, 
in equation (19). 

Thus, approximately, 


V(X,) = KM (1-1 24-992 — 30-06y, y, + 9-184) 


E. . 4-4-396X1— 2:637 X, X, + 4-306 K) 
with similar results for cov (X4, X) and V(X;). 


7. AN EXAMPLE OF INVERSE ESTIMATION WHERE p» d 

A study of the pulping properties of eucalypt woods is reported by xe ae 

(1951). The object of the studies was to determine a treatment which would produce p p 
of the required lignin content, from wood with certain characteristics. The percentage o 
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the wood material soluble in hot water (hot-water solubles, ri) was determined for each wood 
sample, which was then divided into four parts, each being pulped with varying amounts 
of active alkali (r 0 of the wood weight). The same levels of active alkali were repeated 
for each sample, so that the two independent variables were uncorrelated. The lignin 
content of the resulting pulp was measured in terms of a ‘ permanganate number’, and its 
logarithm to base 10 (y) taken as the dependent variable. The data are shown in Table 4, 


Table 4. Data from a study of pulping properties of eucalypt woods 


7,- percentage hot-water solubles; æ percentage active alkali used in pulping; 
y=log permanganate number 


Tı Ta y ey Ta y vi * y E 
5:97 15 1-425 6-79 15 1:498 13:19 15 1-734 
17 1-250 17 1:330 17 1:535 
19 1-170 19 1-233 19 1:326 
21 1-124 21 1-161 21 1:201 
8-00 15 1-641 9-20 15 1-442 9-52 15 1:500 
17 1:418 17 1-255 17 1-281 
19 1-230 19 1:146 19 1-152 
21 1-164 ; 21 1-093 21 1-104 
8-51 15 1-655 10-00 15 1-507 9-46 15 1-610 
17 1-384 17 1:332 17 1:425 
19 1:334 ' 19 1-220 19 1-283 
21 1:164 21 1-199 21 1:204 
4-51 15 1:486 10-94 15 1:667 3-17 15 1-204 
17 1-272 17 1:458 17 1:130 
19 1:185 19 1-258 . 19 1:083 
21 1-124 21 1:173 21 1:004 
315 15 1-250 6-35 15 1:391 3-53 15 1:236 
17 1-146 17 1-207 17 1:149 
19 1:086 19 1:100 19 1-061 
21 1:033 21 1-079 21 1-025 
Total 15 22-246 
17 19-572 
19 17-867 
21 16-852 Means 7-486 18 1.2756 
Grand total 76:537 
This example does not illu 


: mple strate the use of simultaneous equations, but it does show how 
inverse estimation is possible when there are more independent than dependent variables. 
The appropriate regression is that of y on xı and ; however, what is required from the data 
1s an estimate of the relationship of Xy tO a, corresponding to a fixed value of y; in other 
words, the alkali requirement z, which will result on the average in a given lignin content Y, 
when the hot-water solubles figure z, is known. 
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The relevant sums of squares and products are shown in Table 5. The regression equation 
is found to be : Y = 2-123 + 0-03012, — 0-0596z,, (37) 
The analysis of variance in Table 6 shows this regression to be highly significant, and the 
residual variance to be 0-005804. Since the 1 % point of F with 1 and 57 degrees of freedom 
is 7-102, the 99 9% fiducial boundary for the regression relationship is 

(Y - 2:123 — 0-03012, -0-05962,)* = 7-102 x 0-005804 60“ a s. 650% . (38) 

The lignin content required for the pulp corresponds to a ‘permanganate number’ of 15 

(i.e. Y — 1-176); this value inserted in the equation gives the relationship 
X, = 15-89--0-504z,, 

so that, once the hot-water solubles percentage is given, the requirement of active alkali 
can be estimated. The fiducial boundary for the relationship is given by substituting 
y = 1:176 in equation (38). 


Table 5. Calculation of regression coefficients from values in Table 4 


Sum of Sum of Regression 
squares products with y coefficient 
2 516:315 15-5292 +0-030077 + 0-0034 
Ly 300 — 17-887 — 0-059623 + 0-0044 
Table 6. Analysis of variance 
7 
" "og Sum of squares Mean square 
Regression 2 1:5336 0-7668** 
Residual 57 0:3308 0-005804 
Total 59 1:8644 


** Significant at 1 % level. 


8. PROPORTIONAL REGRESSIONS 

In certain cases it is of interest to fit equations in which the coefficients are proportional. 
For instance, in studying various properties of coals, such as their carbon content, sulphur 
content and calorific value, it may be supposed that each is linearly related to the percentage 
ash content. It might be expected that, if the ash were simply the result of admixed im- 
purities, its effect would be a simple percentage reduction, the same for each of the pro- 
perties, Thus, if a were the ash content, and the regression of the jth property y; on x were 

Y, = by H bir, (39) 
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we should expect 5, to be negative, and 5,/5,, to be in the neighbourhood of — 1/100, In 
general, if the theoretical value of the ratio were — 1/£, we could fit the restricted regression 


equations Y, = bj(x—£) (40) 


the value of £ being the same for each line. 

We should then be interested in testing, first, the validity of the assumption of a constant 
value £ for the different dependent variables, and secondly, the acceptability of various 
values of £. 

We shall here consider only the case of one independent variable, which seems to be of 
most practical interest; the extension to more than one independent variable introduces 
no new principle. The additional complication in fitting proportional equations arises from 
the fact that there is a constant common to all the equations. 

Since the equations of estimation of £ are not linear, the method of least squares does 
not lead to exact significance tests and fiducial limits. Instead, the following method is 
adopted. The null hypothesis on which the test is based is that the regressions are pro- 
portional, with constant of proportionality £. The constant is unspecified, so that the test 
criteria are functions of g. If for any particular values of é the test criteria are significant, 
the null hypothesis, and the corresponding value of g, are rejected at the level of significance 
adopted. We are thus able to set fiducial limits on E 

We shall denote the sum of squares of z by t, and the sum of products of y; with x by py, 
and shall adopt the following notation for the restricted regressions with constant £: 


p;—-Sy(z-£, t = S(x—£)*, b= pit’. 
To test the validity of the hypothesis, consider the unrestricted regressions, which may 
be written = = 
Y, = b. 
When « = £, Y, should differ from zero only by errors of random sampling. Hence, the q 
quantities = E 
2; = yj b(5— x) (41) 
have a joint normal distribution centered on zero. The analysis of these quantities provides 
tests of the hypothesis. 
The covariance of 2; and a, estimated with n — 2 degrees of freedom, is 
U Up 
nt (n — 2)* 
80 that the sum of squares of these quantities may be taken as 


nt 
pLEU n, (42) 


This is distributed as the ratio of two inde 


pendent sums of squares with q and n—q-! 
degrees of freedom, so may be tested direct] 


y by means of the F-distribution, We have 
_ (n-q—1)nt 
F= TEC Eie X vike a, (43) 
This provides an overall test o 


f the assumption of ionali » specified 
value of the constant. This may i ee 


be partitioned to give tests separately of the two aspects: 


- 
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Now the quantities bj, or the equivalent j, can be shown to represent the variation among 
the restricted regressions. Also, since 

Pj = p,- nj(5 = (44) 


we see that the expected value of the correlation of z, and p; between lines is zero. This is 
an expression of the fact that the p; account for all, and the 2, for none, of the variation 
between lines. 
Hence, the sample regression of the z, on the p; provides a test criterion for the hypothetical 
value of £. 
The sum of squares for regression is 
nil Y, E vit pi 
AL uoce (45) 
t X Do'ppi 


This is distributed as the ratio of two independent sums of squares with 1 and n—q—1 

degrees of freedom, and thus may be tested by the F distribution. 
(n—q—1)nt(d X vz p] 

t X > wD Pk 


The sum of squares, with q— 1 and 1 —q— 1 degrees of freedom, for departures of the z; 
from regression on the pj, which is given by the difference between (42) and (45), is available 
for testing departure from proportionality. 

It is convenient to express these sums of squares and products in a form which shows 
explicitly their dependence on £. If we write 


J =n LN, 
ik 


then the total sum of squares (42) of the z; is 


LU 136-2) K GEN (47) 


while the sum of squares (45) for regression of z; on pj is 


müßt -r) K — (£—2) (tn) L—-J}—(t/n) KP ds 
E 6 K L1 (48) 


The sum of squares for departure from proportionality is found, by subtraction, to be 


it'[JL—K?) m 
ame aT — 4€-2) (1n) L] 
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The full analysis may be set up in the form of an analysis of variance, as follows: 


L2 Sum of squares 
Constant of proportionality 1 147645 $- s c FO D j 2 
"[JL— KA 
Departure from proportionality — q—1 n[n£ —2)J — £2) tK + (jn) L]' 
Error n-4-1 1 3 
Total do n- LAE Eyn 


= | 0, (nt/t*) 2 |/| tl 


p[Mé-2d -26-31K + zz] (50 


which is the sum of squares for differences among the p}, or for difference of regressions. 
Thus the optimum estimate of £ is that which minimizes departure from proportionali 
and maximizes difference of (proportional) regressions. 
If X is the estimate of E, the equation for X is 


(X-29 K-(X-3) (2-7) K o, 
e X ege 01 tU LAIP- (jm (L-E oil 
The two roots lie on opposite sides of z. 


In other contexts, an analysis similar to this one will provide a test for the constancy 
a set of ratios, and fiducial limits for their common expected value. 


REFERENCES 


Proc. Aust. Pulp, Paper Ind. Tech. Assoc, 5, 315-35. 


Fisner, Hans, Hansen, R. G. & Norton, H. W. (1955). Quantitative determination of glucose and 
galactose, Analyt. Chem, 27, 857-9. 


VAN DER WAERDEN, B. L. (1931). Moderne Algebra. Berlin: Springer. 


ONE-WAY VARIANCES IN A TWO.WAY CLASSIFICATION 


Bv THOMAS S. RUSSELL axo RALPH ALLAN BRADLEY* 
Virginia Agricultural Experiment Station of the Virginia Polytechnic [natitute 
1. Lerropvcrion 


This research is concerned with the estimation of error variances in a non-replicated two- 
way classification and with inferences based on the estimators derived. The resulta are of 
interest in a wide class of applications, In general, the procedures developed may be used 
in checking the assumption of homogeneous error variances in randomized block designs 
under certain conditions and, in particular, use is seen in comparing the precisions of 
analytical methods in quantitative experimentation and the consistencies of judges in 
subjective experimentation. Someof the results resemble those obtained by Bartlett (1937) 
who considered the comparison of estimates of variance from independent samples. 

Papers by Grubbs (1948) and Ehrenberg (1950) relate to our problem. We shall comment 
on these papers in turn. 

Grubbs was interested in measuring the burning time of a powder-train fuse through the 
use of several timing instruments attached to a rifle. He required estimates of both powder 
variability and instrument variability. The model assumed was that 


yg T % (i = sumi J = 1. , 7), (1:1) 
where, for example, 

ij is the observation on the ith fuse by the jth instrument, 

u is the effect of the ith fuse, and 

€,; is the error in measurement of the ith fuse by the jth instrument. f r 

Both Ji, and e; were assumed to be normally and independently distributed, M with 
mean x and variance g, ei with mean zero and variance g}. Grubbs used two estimation 
procedures that in fact yielded identical estimators for c5. It can be shown that his esti- 
mators and those obtained later by Ehrenberg are the same as those obtained in this paper. 
Grubbs suggests that a certain function of his estimator and oj has approximately a 
*. distribution under certain conditions and states that other rough test procedures ahd 
be used. We shall comment further on these test situations in connection with tests that we 
develop, [ . 

Ehrenberg essentially obtained three different estimators of aj, although his model was 
somewhat different from that of Grubbs and is identical with the one that we shall sth 
in the following section. One estimator was derived by approximating to m " 
a set of equations resulting from application of the method of maximum an 
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correcting the approximation for bias. This estimator coincides with the one derived 
Grubbs and the one that we develop. Ehrenberg obtained the values of the coefficie 
a general quadratic form in % required to yield his first estimator, but we shall 
quadratic form estimator in a somewhat different way. Ehrenberg’s second estimator 
unbiased but had a variance dependent upon values p,. His third estimator depended 
the use of ranges. He noted that it may be presumed that his last estimator is less effici 
than the first, as is usual in variance estimation by ranges in comparison with va 
estimation by second sampling moments. 

In this paper the estimator common to Grubbs and Ehrenberg is re-derived using Ý 
different approaches. Certain tests of hypotheses based on these estimators are develop 
and we shall show that, under certain conditions, the estimator has an exact distributi 
which is that of a linear function of two independent y?-variates. The approximation 
Grubbs (a similar approximation was suggested by Ehrenberg) may be taken to be 
approximation to our exact distribution. 


2. MATHEMATICAL MODEL 
For the two-way classification, we assume the model 


Vg = lit T, Kei (i = I. , n j= 1, 7), 

where 

y;; is the observation in the ith row (on the ith item) and jth column (by the jth observe 

F; is à parameter representing the mean of the ith row, 

Ê; is a parameter representing the additional effect of the jth column, and 

€;; is a normal variate with zero mean, eig independent of any other Eqn: 
The model differs from the usual one of analysis of variance in that here we take €; to hay 
a variance 


Flein) = (=1,...,.n3j = 1 (25 
We shall use the restriction, Y Ê; = 0, often employed for determinacy of solution of leai 


Squares. We could have written x; = p+ 7, in terms of a mean effect and an effect of the it 
TOW, where X 7; = 0 but we did not choose to do so. 
i 


for a possible constant bias B; 
type of problem. The use of subjective scores 
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the precision of the jth method of analysis. Grubbs assumed that A, = 0 and that ji was 
a random variable; however, he was interested in examples like this depending upon 
objective measurement, 

Returning to the model, we introduce a matrix notation that will be useful later. Let Y 
be a column vector of nr elements y,, or of r column vectors I. cach with » elementa y,;. 


The transpose of Y is 


Y' = (Yj, , 79 (2:3) 

and Yj = (yu) Gu 1. , H. (2-4) 

We define B' e (M, H, Bay s fl.) (2:5) 

and A’ = (A1, ..., 4), (26) 
an (n +r) by nr matrix expressed in terms of (n+r) by n submatrices. We write 

A= la] li = 1,...,.0; p =1,...,(n+r)), (2:7) 

where % = when psi or (n4j) (2-8) 

=0 otherwise, 
It is now apparent that Y = AB«£, (2-9) 


where £is the vector of variates c;; in positions corresponding tothe elementsof Y. (2:9) may 
be taken as a representation of sample observations in terms of the model (2:1). If we take 


expectations, we have E(Y) = AB. (2:10) 


3. MAXIMUM-LIKELIHOOD ESTIMATION 
(i) A direct application 
The straightforward application of the method of maximum likelihood for estimation of 
the parameters in B and the cj leads into difficulty. The likelihood function is 


AY) = Cay TEog de expL- E F wy- m-pl (31) 


The normal equations, resulting from the process of maximizing or minimizing f( Y) with 
respect to the parameters, yield solutions for ii and Ê, but with 7, dependent upon 95, and 
reduction of the normal equations yields 


" 2 
X (Via 5 y.) 64 
q=1 
X 16; 
q=1 


"n * 
in the attempt to evaluate 63. This is a procedure also considered E 1 . 
Iterative solutions of (3-2) converge yielding a value zero for one 95. The o 
(3:3) 


(j =1,...,1), (3-2) 


oom 63 = X (yg—9.5 Vi * V.» In G*pj- ln 
7 


i i i luti the normal equations related to 
"E that a simple but inconsistent solution to the t r ; 
KY) Aid 1 pili. with r ae We leave the reader to refer to his paper for a discussion of this 


Special case, 
8 


Biom. 45 
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Actually if any one 6$ is zero, a solution to (3-2) in the form (3:3) for the remaining 63 exista, 
But the evaluation of (3:1) with these solutions minimizes that function for the evaluation 
of (3-1) essentially depends on J í 
SA gi^*P o, * 
(ii) An indirect application 

We have been led into difficulties in our attempt to use maximum likelihood to obtain 
estimators of the g} in the direct application of that method to the likelihood function KY) 
in (3-1). Now, if each 03 = , a? would be estimated in the usual way of the analysis of 
variance for a two-way classification. That estimator of g? depends on (n — 1) (r — 1) linear 
contrasts formed from the original observations. These contrasts have zero means and do 
not depend on the z and £; whether or not oj = a? (j = 1, ..., r). The estimator of c? of the 
analysis of variance is not the maximum-likelihood estimator obtainable from (3:1) with 
7j- c*. But it is the maximum-likelihood estimator with reference to the likelihood 
function of any set of (n — 1) (r — 1) linear, and linearly independent, error contrasts. This 
suggests that difficulties in estimation based on (3-1), which stem from the simultaneous 
estimation of the /i, n, and 3, may be circumvented if we first transform our data and then 
restrict our attention to a set of (n 1) (r 1) error contrasts for the estimation of the ej. 
We now proceed in this way. 

We use a set of (n— 1) (r— 1) error contrasts that have a reasonably simple variance- 
covariance matrix. Let Z be a column vector of (n. — 1) (r — 1) new variates defined by 


Z=0Y. (3-5) 


C, the matrix of the transformation, has (n 1) (r — 1) rows and nr columns defined by the 
'direct-product of matrices* D. and (- D,), 


€ - D,x(—D,). (3:6) 
The matrix D, has (n— 1) rows and n columns and is 
1 -1 uires 0 
H D — DOS 0 
. . . . eee " : (3-1) 
NE T M 


D, has the same form with (r— 1) rows and r columns. The elements of Z form one of the 
possible sets of (n — 1)(r—1) linearly and stochastically independent contrasts associated 
with the error variance of the analysis of variance for the two-way classification when 
oj = 0? (j = 1,...,r). It is clear, even when oF+ 0°, that 


CA =0 (3:8) 

3 * The direct-produet notation is as used by MacDuffee (1946) and is the same as the Kronecker pro- 
= used by Vartak ( 1955). The notation here implies that each element in — D, acts as a scalar multi- 
plying D.; every element in — D, is replaced by the product of the element and D,. Then C has 
(n—1) (r— 1) rows and nr columns as indicated above. 
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and B(Z)=0 


in view of (2-9). 


The variance-covariance matrix E, of the new variates in Z is 


=, = 2,0", 
e o 4 
E. 2, „ „ 4 
^» 9 srl 
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(39) 


(310) 


(311) 


and 7, is the n by n identity matrix. An alternate form for T., apparent when the multi- 


plication is effected in (3-10), is 
E, = (D, D/) x H, 
where H is (r — 1)-square and written 


etc of od 
g.| A ecd .. of 


The joint density function of the new variates is 
f(Z) = (21)-»-»e-» | Y7 |'exp( L). 


It follows that LF) = (D,D) xH i 
(-) -1 .. 21 
— —1 -1 
view of 019) | once nee 
-1 -1 . (n-1) 
bs b: ds o MES: 
7i eie: 
, 1 TUI mio 755 2 
and H = A 0303 % ui 
M Pp nut 
dd aed = 
with ex eh 


(3:12) 


(3:13) 


(3:14) 


. (3:15) 


(3-16) 


(3-17) 


(3-18) 


It is easy to verify that (3-16) and (3-17) are correct. (D, D4) isan (n— I)-equare matrix, 


We shall maximize L = Inf(Z) from (3-14) with respect to w, where 
o, = o5] H | 


6); are available, (3-19) is solved for the estimators of oj. Now 
oL 1 [X] yy? a, 
90; 


5% 20 Lei 9o; 


(3°19) 


and we use ‘In’ to denote ‘natural logarithm’. When maximum-likelihood estimators of 


(3-20) 


8-2 
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Let Inc a a eL (3-21) 
eu, 
where is the (s, u element of E>". It follows that the first term of the right-hand member 
of (3-20) becomes IEXÁ. 
and the second term becomes =$ E L Sits Zu 
au 


Then, when aL ew, is equated to zero, we have 
LL , 200 = 9 (j =1,...,7). (3:22) 
su 
Solution of (3:22) yields the maximum-likelihood estimators 83 of c3. Note that 
9 
O. Dau H= (3-23) 
ws 


from (3-15) and because (D, D,) is a matrix of constants. 
We have been able to solve the equations (3:22) when r = 3. Then the equations reduce to 


2(n— f=. | fo. fes finu || Z, (3-24) 
2(n - 1)ei = Z' | fimu fes Ian Z, (3:28) 
and 240 — 0 = Z' | Ju. fas — fas] Z. (3:26) 


We shall not solve these equations here but delay solution until the next section, where the 
solutions will be compared with those obtained by another method. When r > 3, solution of 
equations (3-22) involves the solution of simultaneous polynomial equations in the oj. 
When r — 2, solution of the two equations represented by (3-22), is impossible; then the 
equations become identical except for a constant multiplier and a solution for (024-63) 
only is possible.* 

4. QUADRATIC FORM ESTIMATION 


A quadratic form in the original observations y;; would be a desirable form for an estimator 
of at (t = 1, ...,r). Consider the general form 


Q, = DE E E minel) hne (i h = 1, <t; J, K = 1; . (41) 


Q, is determined when values of Mijng(t) are determined. This is done by imposing reasonable 


restrictions of symmetry on the mint) and by requiring} that E(Q,) = o? independent 
of u; and fj. 


1 1 ge in some cases, but, if variance estimators from. 
here will not Hasi di are to be compared, the disadvantage may be quite real The estimators obtained 
here will not have distributions depending on the ii and Bi 
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Conditions to be satisfied by Q, are 
(i) Q, must be invariant under interchange of order of items. 

(ii) Q, must be independent of the parameters A, and A, 

(iii) Q, must be an unbiased estimator of of. 

We think of M, the matrix of the form (4-1) as an r by r matrix of matrix elementa M, 
which are themselves n by n matrices with elements Pha, in the (i, A)-position of M. 
We drop the argument ¢ with M and mejas for simplicity except where this may lead to 
ambiguity. 

Condition (i) implies that mn = m (4:2) 


and Mijat = Majin. (+3) 
These equations mean that M;, is symmetric with equal diagonal elementa. Condition (ii) 
requires that Q, should not depend on y, and , and this essentially means that Q, is a fune- 
tion of the e of (2-1). Condition (ii) is satisfied if 


Lu — 0 (4-4) 
and Don - 0. (4:5) 


Equation (4-4) states that the sum of corresponding elements from each A (or A in any 
column (or row) of M is zero and equation (4-5) states that the sum of the elementa in any 
row (or column) of M is zero. In order that E(Q,) = cas required by (iii), itis necessary that 


Ema = 1 (4:6) 
7 
and X mig = *. (4:7) 


Equations (4-2) through (4-7) are not sufficient to determine M uniquely when r> 2, but 
are inconsistent* when r — 2. Additional assumptions of symmetry are imposed when r » 2 
and the following discussion is limited to the case with r » 2. The elements Thx of M may 
be regarded as weights assigned to observation products y,; y, in Q,. Differences in weights 
should depend on the variances of the observations in these products; usually no a priori 
knowledge of these weights will be available, and in addition we shall be concerned with 
tests of null hypotheses postulating equal variances and hence equal weights. We then 
essentially regard it to be reasonable to assume equal variances g} (j +#), leading to further 
definition of the elements of M. The weighting system used in the estimation of o? and in 
the determination of M requires that 


M. = AM, (b, 4 * 5, 9 = Let) (4-8) 

Mp = Me (ptqutv p, . uv p, g. u, v = 1,25); (4-9) 

M =Myq (5, K.; 5.9 = 1.77), (4-10) 

and min = me (ih, SEGA h. i, h = Lum J. E = "hogy s (4:11) 
Using all of the relations, (4-2) to (4-11), we obtain 

n(n—1)(r— 1) (r—2) ==, (4:12) 


* The inconsistency of the equations when r = 2 indicates the impossibility of obtaining estimators 


With the required properties in this special case. 
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My =0 (qt (413) 
n(n-l)(r-1)(r-2)M,- —(r-2)M, M,- M, (q+), (414) 
and n(n-l)(r-1)(r-2)M,, = M (v, gt, p), (4-15) 


where Jf is a matrix with diagonal and non-diagonal elements identical to those of n(D, D; y 
defined in (3-16) except that M is an n-square matrix. 

We shall now show that the estimators obtained in this section are equivalent to those 
obtained from maximum likelihood on tlie contrasts of Z in $3 in the case where n> 2 and 
r = 3. The equivalence does not follow when r>3. We shall demonstrate that 51 Q, and 
the remaining two similar identities follow from symmetry. 63 is defined through (3:24) 
in terms of the elements of Z. Rewriting (3:24) in view of (3:5), we have 


2(n—1) 6% = Y'C' | anf fos — ful CY. (416) 


Also, in view of (4-12) to (4-15) and (4-1), in the special case with f = r n> 2) 


9 
20 =} YI x —1 0 1 E (4:17) 
il 13550 


Evaluation of (4-16) follows from (3.23), (3-17) and (3-13). When r = 3, 


1 +o -o 
CAS) LEOS 1 18 
e E 
a 1 -1 D 0 0 a 1 
and H = n EC EIE us 19 
du, Hir 1]: do, Il AT a. =|| 0 o |: “a 
Then ele, = D, bn 1 6 
0 
(4-20) 
-| io Supa P 
(D, Dp) 0 à 
Now 6 = D,x( — D,) as given in (3:6) and simple matrix algebra yields 
à 2 -1 1 
O'l Sim fef] C= DAD, DD. | -1 0 1 es 
-1 1 0 


It is again a matter of matrix multiplication based on (3-7) and (3:16) to show that 


DOD, Din D, = /n; 

this is all that is n 
members of (4-16) and (4 
r = in the next section, 
The form of Q, may be sim 
simply sketch this reductio 


to show that the matrices of the quadratic forms in the right-hand 
17) are identical. We shall use the result that 6? = Q, when n>2 


plified through some extremely cumbersome algebra. We shall 
n and note that further details are given by Russell (1956). 
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M, the matrix of O has been defined in (4-12) to (4-15). We now consider the matrix form, 
n(n—1)(r—1)(r—2)Q, = Y'uY (4:22) 


and n(n— 1) (r— 1) (r—2) M = p. We shall let u represent the (i, j)th column vector of 
it and /i contains nr elements. To begin the simplification of Q, it is shown that 


I = nr(y;,.—y..) n N= I) - G0, (4:23) 
and Y'pu = nr(r— 2) (ya— y.) ^ nr(r—-2) (yi, — y... (4:24) 
The notation is conventional 
= 1 Sup t= 1250 2 ly Xy (4:25) 
"5 n 7 nr 7 


Now (4-23) and (4-24) define the nr elements of the row vector T and vector multiplication 
(Y'u) Y yields a result which may be reduced for the final form, 


(n —1) (r— 1) (r—2)Q, = r(r— ) X (u-y.-y.cty. Y IO . u. . 
(4-26) 


Q, in (4-26) is identical with the estimators recommended by Grubbs and Ehrenberg and is 
now expressed in the form that they used. 

Note that Q, may be negative. If this causes difficulty, it is an interpretative one similar 
to that experienced in the estimation of variance components. In the special case with 
r = 3, it can be shown that only one Q, may be negative; it may be seen from (5:4) and (5-5) 
in the following section that the sum of any two Q's must be non-negative. A general rule 
when r > 3 is not evident. 

To conelude this section, we note a result obtained by both Grubbs and Ehrenberg. They 


found the variance of Q, 


or L 


2 f f oj. 4-27 
VQ) het =I ih h B" à (4-27) 


(r-2y 
t ht 
This result may be used to obtain a X- approximation to the distribution of Qo? when 
03 = o? (j = 1, ...,r), as discussed in the next section. However, there, Q, is expressed in 


a different form and its moments may be obtained more directly. 


5. DISTRIBUTION THEORY AND TESTS OF SIGNIFICANCE 


In this section we shall consider the nature of the distribution of Q and propose several 
tests of significance for hypotheses of assumed equality of the variances 9j To facilitate 
in terms of (n — 1) (r — 1) new variables instead of the 


this work, it is desirable to express Q, - : 
original nr observations shown in (4-26). We shall consider this latter problem first and only 


indicate the algebraic procedure. 
Consider the transformation 
Y-WY (5-1) 
¥ and take W to be orthogonal. There are many 


i i ctor 
of the nr observations in the column ve 1 


matrices W that would suffice, but we take W as described. below. 
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Helmert matrix of n rows and n columns with the row of identical elements placed in the 
last or nth row position. The matrix 

T-LxL (5-2) 
is orthogonal. W isthe matrix obtained by placing the jnth row of P in the [(n — 1) (r — 1) -j]th 
position of W [j = 1, ..., (r— 1)], while the remaining (n — 1) (r — 1) of the first n(r — 1) rows 
of T are retained in order in the first (n — 1) (r — 1) row positions of W. The last » rows of W 
are identical to the corresponding rows of I’. The inverse transformation to (5:1) is required 
for substitution of the new variables in the form (4:26) for G. Then, 


Y= Wy. (5:3) 


Substitution in (4-26), taking advantage of the orthogonality of W', lets us write Q, in terms 
of the first (n—1)(r—1) new variables, Ür e, Hoop: (We are using a single subscript 
notation for the new variables.) The algebraic details are given by Russell (1956) and we 
simply note here that 


(n—1) -1). 
(n—1)(r—1)(r-2)Q, = r(r— 1) P "E i ld. 


(r—2) 


z abu Big i rn 772 
b 0 2% 2, % 


where f = 1,...,r and, when ¢ = r, the second term in parentheses is defined to be zero. 
When t = r, we have the simpler form, 


(54) 


(n—1) (r—2) s 


(n—1) (r— 1) (r—2) Q, = r(r— VE e oa 4p— yi. (5:5) 
Pe 


u=1 
The asymmetry of forms of Q, for various values of t is introduced by the transformation 
(5-1); rearrangement of the rows of W would permit showing any specified Q, in a form like 
that for Q, in (5:5). It follows from the form of W that Jy es Hu-Do-p have zero means, 
are normal, and are independent with equal variances o? if 05 = g*( . 


E Test 1 
Consider the given conditions: 
Conditions 1: oF=07 [j=1,...,(r— 1)], (6:6) 
a? known, 
and the null hypothesis, Ho: on = on. (51) 
The alternative hypothesis is Ha: cip (5-8) 


but with Conditions 1 retained. We compare Q, with a? and we need the distribution of 


Q,/o? under Conditions land Ho. Now jlc, ..., Ha- are independent standard normal 
variates and the right-hand member of (5:5) depends on two independent sums of squares. 
If we divide both sides of (5:5) by c?, we note that 


e- -N = -M- bes, Eg 
where the 2s have the indicated 


de, of freed. d i 
The distribution of the go NI pedet. 


difference of two independent *-variates has been studied by 
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Pachares (1953) and Gurland (1955). Pachares has shown that an indefinite quadratic form 
has a density function that can be expressed in terms of Bessel functions; Gurland used 
a finite series of Laguerre polynomials. In the special case in which n = r = 3, it is easy to 
see that the density function of q = O is 


f(q) = e-hj2. (q>0) 
= e (q<0). 


An examination of the moments of Q,/c* suggests that its distribution may be approxi- 
mated by taking C, to have a . distribution with C degrees of freedom with 


oe MW (r= 1)(r-2) 


orl N 


Grubbs essentially used this approximation with r = 3, but the goodness of the approxima- 
tion is not known. It does not seem necessary to attempt to table the exact distribution of 
Q,[c? in view of the results of Test 2 below. 

Note that symmetry does exist and our discussion here applies not only to Q, but to any 
Q; selected from values j = 1,...,r. However, correlations between the Q's do exist. 


Test 2 
Let us again consider the situation of Test 1 except we now have 
Conditions 2: of =o* [5 1, ...,(r—1)]; (5-11) 
c? unknown. 


It is now essentially necessary to compare Q, with an estimate of a? and in fact an exact 
test results. : f 
Let the error sum of squares of the analysis of variance for the two-way classification be 


(n-D(r-1). 
E- - x. = x Tie (5-12) 
i 


u- 


Under Conditions 2 and Hj, E has expectation, (n— 1) (r— 1) G. Division of both sides of 


(5:5) by E lets us write (r-12F 


(-C-) e-). = U FN , (5:13) 
(n=1) 
X V- h- (m1) 
where P= Se (5:14) 
i gul(n— 1) C2) 


and, under Conditions 2 and Hj, Z has the usual variance-ratio distribution with (n I) and 
(n —1) (r— 2) degrees of freedom. (5-13) follows easily from (5:5) when it is noted that E 
may be expressed as the sum of the two sums of squares in ( 5:14). Through the association 
of (5-5) and (4-26), Z may be expressed in terms of the original observations as 


2 00-26 (5-15) 


where G, = z (Ji Yi. 7 V. +Y.) (5-16) 


122 One-way variances in a two-way classification 


Now Q,/Z is a monotone increasing function of F and we use F as the test statistic rather 
than a multiple of Q,/E. 
We may test H, under Conditions 2 against one-sided or two-sided alternatives, 


Ha: or os, (517) 
His: of , (5-18) 
Hy: Ho. (5-19) 


If the test has significance level æ and if Hun, vy) denotes the tabular value of F with 11 
and v, degrees of freedom such that P[F > FV, val = a, the rules of rejection corresponding 
respectively to the alternative hypotheses (5-17) to (5-19) are reject H, if 

(i) F>F[(n—1), (n—1)(r—2)], 

(ü) F<1/F[(n—1)(r—2), (n-1)] 
and (ii) P»PF[(n—1) (n—1)(r— 2] or F« 1/F,,[(n—1) C- 2), (n — 1)]. 
Again, some other Q, may be substituted for Q, in the results for Test 2 and the results hold, 

Tests 1 and 2 appear to be useful when one new analytical method is introduced for 

comparison with established methods that are known to have homogeneous variances, 
Similarly, a new judge may be introduced on a panel of trained and consistent judges. 


Test 3 
We should like a test of homogeneity of variances, a test of the null hypothesis, 


Hs = (j= bf); 
against the general alternative hypothesis, 
Has: og (for at least some j). 


We shall assume g? unknown. Extreme difficulties were encountered in developing exact 
distribution theory for this test, even in the case withn =r = 3. Instead we have developed 
a test based on large-sample likelihood ratio theory, but only for the case in which r = 3. 

We have shown in $4 that Q; is identical to 83 (j = 1,..., 3), when r = 3. 6? is the maxi- 
mum-likelihood estimator of a} in reference to the likelihood function of (3-14). The log- 
arithm L of the likelihood function is defined following (3-18). We require L(Q), the maxi- 
mum of L under Has, obtained by substituting Q; for c5 in L, and Leo) obtained under H, 
by substituting the maximum-likelihood estimator of c? for each o5 in L. We now proceed 
to derive L(Q) and Lo). 

To evaluate L(Q), we essentially need to evaluate | $n | and — S , where the carat 
indicates that Q; is substituted for 0j (j = 1,2, 3), in Ej! NowX;1- (D D)? x H- and, 
to evaluate $71, we need H since (D, D; 1 bu 


A> =Ê - 


, Don ., DSU | (5-20) 


-ZD,D;3Z F DI 
and -E b. DI ZU, D; £] ZD, Bg (5:21) 
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where Z' and Z' are (n — 1)-element row vectors consisting respectively of the first (n — 1) 
and remaining (n— 1) elements of the 2(n — 1)-element row vector Z. The matrix multi- 
‘plication, Z (D., DH permits us to write 


-MÉ$DZ--(-1) (5-22) 
in view of (5-20) and (5-21). By the rules of direct-product multiplication of matrices, 
we have 851 = | (D,D; „ . D. 2 e = nm? fi e, (523) 
the latter form following since | (D, D; )-* | = 1/n, a result that may be developed from (3-16). 
We also require | Z | in terms of the estimators instead of the form (5-21) above; then, 
from (4-18), it follows that 

|B | = 0,9, 0,0, 020s. (5-24) 


(5:23) and (5-24) permit evaluation of | $3}. Substitution in L of the results of (5-22), 
- (6:23), and (5-24) yields the final form 


L(Q) = — M(n— 1) (r- 1)]In27 — An- 4(n— 1)1n (Q,Q, + Q, Qs + Q0) — (n 9). (5:25) 
We now evaluate Lo). In this case with oj = of = of = c*, we have 


X= s (D, Dag x H 2 | (5-26) 

and [554] = 07513-99999 (5:27) 
from (3:15) and (4-18), and 

| -ZDZ =- sa [Da x i 5 || Z, (5-28) 

= - ga Ie I. (5-29) 

= 26071) 0,4Q,+Q)- (5:30) 


3c? 
(5:29) follows from the definitions (3-23) and the derivatives (4-19); (5-30) follows from (3-24) 
to (3-26) and the equivalence of Q; and cj (j = 1,2,3), when r = 3. Then Z becomes 


Dy = H-) I)) In 27—Inn—}(n— 1)In3— (n — 1)c? 


-CP Q+ +Q) (5:31) 
The maximum likelihood estimator of o? obtained from Ly is 
i 6? = i Qs) (5:32) 


and substitution in Lo leads us to 
Lo) = —H{(n-1)(r—1)} In 2n—Inn—}(n—1)In3—(n—1) 
—(n—1)In}(Q,+ e. (5:33) 
Note further that XQ, + 0,7 Q3) = E/2(n—1) (5:34) 
* See MacDuffee (1946), p. 82. 
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from (4-26) and (5-12). With substitution of (5:34) in (5-33), we finally obtain 
-21nÀ = - 2(Z(v) — L(Q)] 
= -(n-U[In(Q,0, 0,0,-Q,Q)) - 2In Z--21n(n—1)-In$]. — (5.35) 


An approximate test of H, against Vs is obtained by taking —21n A to have a 2. distribu- 
tion with 2 p.r. This test becomes approximately correct for large values of n. The com- 
putation of — 2 ln causes little difficulty when values of Qi, Q and Q; are known. We know 
that values of Q; may sometimes be negative. However (C102 ＋ 103 2 Cs) is positive 
with probability one. To show this, we write 


102 ＋ Y +Q; = (Q+ Q2) (O1 ＋ 03) —Q? (5-36) 


and substitute in the right-hand member of (5-36) using the forms (5-4) with r — 3. Then, 
we need to show that 


(n—1) eem du M (n—1) Hd 2 
2 Yg 2 ico» ( = e-) . 
g=1 g=1 g=1 


But this is Cauchy’s inequality, true for all values of the variables except when /e 
is a constant for allg = 1, ...,(n— 1). The exception is an event that occurs with probability 
zero and yields an equality instead of the required inequality. 


6. COMPUTATIONAL TECHNIQUES AND A NUMERICAL EXAMPLE 


We illustrate the computations required to apply the methods here developed through 
consideration of an example. The example below is based on chemical determinations; à 
second example is given in detail by Bradley (1958) and is based on the subjective scoring 
of dried veneer as described by Kauman, Gottstein & Lantican (1956). 

In the distilling industry, ground grains, usually corn and rye with about 1 % of malt, 
are slurried with water and cooked with constant agitation to 100? C. This gelatinizes the 
starch. The term ‘mash’ is applied to this cooked grain slurry, The mash is cooled to 62°C. 
and additional malt meal added. The malt converts the starch to sugars. After 20-30 min. 
at 62°C., the converted mash is cooled to room temperature and pumped through pipe 
coolers to the fermentors. Several such cooks are required for each fermentor. A yeast mash 
containing actively growing yeast is added and the mixture diluted to a definite volume in 


of proof gallons per bushel (561b.) of grain. A proof gallon is l gal. containing 50% alcohol 
by volume. The data in Table 1 are the yields, suitably coded, from each of three fermentors 


We first compute the estimates of variability, Qi. Q, and Q; for the three fermentors. 
The row and column means y; and y. are shown on the borders of Table 1 ; row and column 
totals would normally also appear but have been omitted here to save space. In Table 2 
we show the residuals J TJ.) required in the computation. Row and column 
totals in Table 2 should be zero and this is a check on the computation; slight departures 
from these zero totals in Table 2 result from rounding errors. We have sometimes found it 
helpful to compute a table of values of (y; - y. /-.) as an intermediate step between 


ey 


G 
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Tables 1 and 2 but do not show such a table here. The sums of squares of the entries in 
columns of Table 2 are shown at the bottom of that table and designated as G,, G} G, in 
accordance with the definition (5-16). Now, of course, E = G,+G,+G,, the error sum of 
squares of analysis of variance, and, from (5-12), (5-16) and (4-26), we may write 


rG; E 


om (n—1)(r—2) (»-16-16-3 Ü= bet) 


(6-1) 


A further computing check is effected through the comparison of x6, with E from the 
ji 


analysis of variance, for it is likely that the analysis of variance will have been computed 
in most cases. In our example r = 3 and n = 38. Using (6-1) and G, and E from Table 2, 
we illustrate by noting that 


Q, = 30032520) _ 0-081414 
O BDM (37) (2)(1) 


Similar computation leads to the remaining values of Q, in the last row of Table 2. 

The process of charging the fermentors is such that the first fermentor receives less than 
its share of grains and the third fermentor more than its share of grains. The yield of the 
third fermentor should be higher than that of the first and this is substantiated by the data. 
Itappeared from experience that the variability in yield of the third fermentor was less than 
those for the other two. We apply Test 2 to the sample data to compare fermentor yield 
variabilities. 

Recall that Test 2 supposes that o? = o} = o? and we test the hypothesis, Hj: O = , 
against the alternative, Hi: 03 « O, of (5:18). We use the F-statistic of (5-15) and the rule 
of rejection (ii). For the example 


= 0-001537. 


3(1) (0-014108) M 
= (2) (0-081414) — (3) (0-014108) 


obtained through substitution of the results of Table 2 in (5-15). Here F has (n— 1) — 37 
and (n— 1) (r—2) = 37 n.r. Taking the significance level to be 0-05, we find the tabular 
value of f, [(n 1) (r— 2), (n— 1)] to be 1-72 and 1//,,; to be 0-58. Clearly the observed 
value of Z is significant at the 0-05 level and actually is very close to the value for the 0-001 
level of significance. The alternative hypothesis H,,; is accepted. 

We also apply the approximate Test 3 to the given data to illustrate the use of that test. 
This is the general test of homogeneity of variances given r = 3. Substitution is made 
directly in (5-35) and we obtain, with the use of common logarithms and a conversion factor 
to natural logarithms À 

— 21n À = - (37) (2-3026) [log {(0-001537) (0-001722) 
-+ (0-001537) (0-000041) + (0-001722) (0-000041)} 
— 2log 0-081398 + 2 log 37 + log 4/3] 
= 9:83. 
~2In A is taken to have a y?-distribution with 2 D.F. and the observed value lies between the 


tabular values for 0-01 and 0-005 levels of significance. . 
Tests 2 and 3 are those of primary importance and they have been demonstrated with the 


example. Test 1 will not usually be useful and has not been discussed in this section. 


wo J 


0:35, 


F 
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Table 1. Coded yields y; for three fermentors for 38 days 
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Table 2. Residuals (y,, — y, — y. p +y.) for three fermentors for 38 days 
sf 


he 3 
24 
a 


— 027 044 
— 044 — 013 
— 0-029 


0-040 

+013 — 006 
— 040 -041 

003 — 016 
— 047 024 


— 0-030 0-021 
+033 014 
005 016 — 023 


ee — — j — 


0-032520 0-034786 0-014108 
0-001537 0-001722 0-000041 


E = G . , = 0081414 


TECDE —6äãůw 
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7. Discussion AND SUMMARY 

The problem of obtaining estimators for the individual, one-way, error variances in ay 
unreplicated, two-way classification has been studied. The estimators obtained are equi- 
valent to those previously used by Grubbs and Ehrenberg although the methods of deriva- 
tion are different. The estimators are shown to be maximum-likelihood estimators in a 
certain sense for classifications with r = 3 columns, and also result more generally from 
consideration of reasonable restrictions imposed on a general quadratic form in the 
observations. 

Test procedures, not previously available, are developed and applied in a numerical 
example that illustrates the necessary computations. One test is for homogeneity of the 
one-way variances and is available based on large-sample theory when r = 3. Another test, 
which is exact for small samples and unrestricted values of r and n, is for homogeneity of 
the one-way variances again, but under the assumption that it is known that all but one 
specified variance are homogeneous. A discussion of the distribution of an individual 
estimate of variance is included under a third test procedure like the second discussed above 
except that then it is supposed that the common value of the homogeneous variances is 
known. We have applied the methods of this paper in several areas of research and found 
the results useful and informative. 

Test 3, the general test of homogeneity of variances, is available only when r = 3 andis 


only asymptotically exact as n becomes large. We have attempted to obtain a small -sample 
test based on the statistic, 


E 2 
(acer 
ae (n—1)(r-1)|, 
T-E E? 


or reos which is monotonically related to T. In the special case in which n = r = 3, we 


obtained an exceedingly complex distribution function for T which is shown by Russell 
(1956). "This result did not suggest the appropriate generalization for larger values of n 
and r, nor did it suggest that further consideration of the small sample distribution of T is 


likely to be fruitful. Further study may lead to an approximate procedure for values of 
r»3. 


the data was prepared by Dr Conner and given here almost verbatim. Suggestions by 
D. B. Duncan, W. A, Thompson, Jr, P. D. Minton and P. N. Somerville along with their 
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STUDIES IN THE HISTORY OF PROBABILITY AND STATISTICS 
Vil. THE PRINCIPLE OF THE ARITHMETIC MEAN 


By R. L. PLACKETT 
University of Liverpool 


The history of the problem of combining a set of independent observations on the same quantity is 
traced from antiquity to the appearance in the eighteenth century of the arithmetic mean as a 
statistical concept. 


The problem of estimating parameters from observational data appears first to have 
presented itself to the Babylonian astronomers of the last three centuries B. C. Their achieve- 
ments are recorded in cuneiform script on clay tablets and have been analysed by N euge- 
bauer (1951) who has also (1955) published a collection of the texts. The following summary 
is abstracted from hisresearches. Between about 500and 3008.c., the Babylonians developed 
a systematic mathematical theory to account for the motions of the sun, moon and planets; 
and they evolved simple arithmetical schemes by which the positions of these bodies could 
be calculated at regular intervals of time. Beyond the fact that the basic parameters in the 
schemes represent a compromise between observation and the needs of computation, 
nothing has survived to indicate how they were estimated from the original data, which are 
themselves almost wholly absent. 

Rather more information is available concerning the methods by which the Greek 
astronomers analysed their observational data, for their discoveries were made possible, 
partly by developments of mathematical technique, and partly by the steady accumulation, 
since about 3008.0., of a series of observations on the positions of stars and planets, made 
with graduated instruments. The Syntaxis of Claud Ptolemy not only presents a complete 
account of what was known to them, but also contains nearly everything that survives of 
the work of their greatest representative, Hipparehus. In what follows, we refer to the 
edition in two volumes translated and annotated by Karl Manitius (1 913). 

According to I, p. 133, Hipparchus noticed inequalities in the intervals of time between 
Successive passages of the sun through the same solstitial point, and this suggested to him 
the question whether or not the length of the tropieal year is constant. He considered, 
however, that the error in his observations and in the caleulations based on them might 
5 to as much as } day, and he concluded that any variation in the length of the year 
was quite insignificant. Subsequently, Hipparchus estimated the maximum variation in 
length as } day, apparently by taking half the range of his observations (I, pp. 136-7). 


position Hipparchus determines the position: 
intervals. 


The technique of taking the arithmetie mean of a group of comparable observations had 
not yet, however, made its appearance as a general principle. This is shown by Ptolemy's 
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estimation of the amount by which the length of a year exceeds 365 days. Hipparchus 
had made the observations given below (I, pp. 134-5): 


Autumn equinox Spring equinox 
(1) 162m.c. Sept. 27 18^ (1) 146 m.c. March 24 6^ (11^ at 
(2) 159 m.c. Sept. 27 6 Alexandria) 
(3) 158 B. 0. Sept. 27 128 (2) 135».c. March 23/24 midnight 
(4) 147 B.c. Sept. 26/27 midnight (3) 128 5.0. March 23 18^ 


(5) 146 B.c. Sept. 27 6^ 
(6) 143 B. 0. Sept. 26 18^ 


Ptolemy gives (I, p. 142) a single observation of his own on the Autumn equinox, namely, 
A.D. 139 Sept. 26% 7^, and compares it with the fourth observation of Hipparchus, whence he 
finds that in 285 Egyptian years of 365 days, the Autumn equinox advances by 701 7^, 
which he writes as 70+}+ 5 days. He then gives (I, p. 143) a single observation of his 
own on the Spring equinox, namely, A.D. 140 March 224 13^, and by comparing it with the 
first observation of Hipparchus, again arrives at an advance of 70--]--j; days in 285 
Egyptian years. A year of 365} days would imply an advance of 71} days in 285 years, and 
the decrement of 714 — 702; = 1$ day in 285 years is equivalent to 1 day in 300 years. Thus 
Ptolemy reaches the value of 3651 — $y days for the length of the year, and this is precisely 
the value which Hipparchus is quoted (I, p. 145) as having found. 

À similar example of Ptolemy's veneration for Hipparchus is provided by his discussion 
of the precession of the equinoxes, a phenomenon discovered by Hipparchus, and caused 
by the motion of the pole of the equator round the pole of the ecliptic, the annual movement 
being about 50". According to a quotation in II, p. 15, Hipparchus estimated the change in 
the position of the solstices and equinoxes to be at least ;4,° per annum. Ptolemy then gives 
(II, pp. 18-20) a catalogue of the declinations of 18 stars as observed by (i) Timocharis and 
Aristyllus, about 290 B. C., (ii) Hipparchus, and (iii) himself. He selects 6 stars from the 
catalogue and shows that they all lead to a precessional constant of approximately tt per 
annum, which is thus his estimate, whereas for Hipparchus it was a lower limit. These 
unique data have been analysed by several commentators, beginning with Delambre 
(1817, pp. 254-5) who showed that the average precessional constant from all 18 stars is 
near the correct value, whether the changes of declination from (i) to (ii), or from (ii) to (iii), 
are taken. Recently Pannekoek (1955) has confirmed the accuracy of Ptolemy's arithmetic; 
and he suggests that Ptolemy selected the 6 stars which agreed best with the value of 
15s? per annum, but which actually each exhibit too small a change of declination. 

The technique of repeating and combining observations made on the same quantity 
appears to have been introduced into scientific method by Tycho Brahe towards the end 
of the sixteenth century. According to his biographer, Dreyer (1890, p. 350): 


Each observation thus gave a value for the right ascension of a Arietis. During see papi ipe 
‘Tycho repeated these observations as often as an opportunity offered, and, in order 1 ze n 
effect of parallax and refraction, he combined the results in groups of two, so that one Mer Sy ae 
observation of Venus while east of the sun, the other on an observation of Venus west o. t d Pes Me le 
the observations were selected so that Venus and the bags as jn 5 Leathe a ha atte RR vA es ina- 
tion and distance from the earth in the two cases. m bserv: j hori as 
single determinations, and from the years 1582-88 twelve results, each being the mean o : 
found in the manner just described. The fifteen values of the right ascension of t San, decim 
fully well inter se, the probable error of the mean being only +6”, but kiss 5 . dle 
the twelve groups show rather considerable discordances, the greatest an. small ring 2 
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But anyhow the final mean adopted by Tycho is an exceedingly good one, agreeing well with the 
best modern determinations. He adopts for the end of the year 1585 26° 0' 30", the modern value for 
the same date being 26° 0" 45*. 

The observations to which Dreyer refers are reproduced below from Tycho's collected 
works (2, 170-97): 


1582 February 26 26° 0" 44” 
1582 March 20 26 0 32 
1582 April 3 26 0 30 


1582 February 27 26° 4'10* Rt 
1585 September 21 25 56 23 


1582 March 5 25 56 33 M 
1585 September 14 26 4 43 

1582 March 5 35 59 15 be 2d 
1585 September 15 26 1 21 

1582 March 9 25 59 49 

1585 September 15 26 1 16 es 
1586 December 26 23 54 5l \ iocus 
1588 December 15 26 6 32 

1586 December 27 28 52 22 ^ 
1588 November 29 20 8 52 

1587 January 9 20 2 5 26 0 27 
1588 December 6 95 58 49 

1587 January 24 26 6 44 26 0 29 
1588 October 26 25 54 13 

1587 August 17 26 5 40 

1588 April 16 25 54 48 eee in 
1587 August 17 TR X TE 

1588 April 16 25 59 aJ BIO 4 
1587 August 18 25 54 35 I 
1588 March 28 26 6 20 : 
1587 August 18 35 54 49 Pa) 
1588 April 16 26 6 30 $ 


The process of combining the first pair is thus described by Tycho (ibid. p. 171). 


Ab hac rursus Differentia Ascensionis vsque ad Lucidam P subtracta, quae est part. 83. min. 57. //.20, 
prouenit Ascensio Clarae Y, part. 25./.56.//.10, cui pro Mensibus 3 residuis addantur //.18, & obtine- 
bimus Ascensionem Rectam Lucidae P part. 25. min. 56. /|.23, Anno 1585 completo correspondentem. 
Sed Anno 82 ex Die 27 Februarij, fuit eadem Ascensio Recta prius data part. 26. min. 4. /. 16, vt sit 
differentia vtriusque min. 7.//.53: Dimidiata min. 3.//.56% addita minori vel subtracta a maiore, 
prodit vera & limitata Ascensio Recta Lucidae q part. 26./.0.//.20. Quam hac Methodo nulla habita 


W Parallaxium atque Refractionum, sed illis sese mutuo sic corrigentibus, inquirere propositum 
rat. 


The average of the twelve determinations by means of two is 26? 0’ 27", and the average 
of all fifteen is 26° 0' 29". How Tycho arrives at 26° 0' 30" is not described, but we note that 
the co-ordinates of the nine standard stars in his catalogue are all given at 5” intervals, more 
than adequate for observational purposes. In fifteen cases out of eighteen, the co-ordinates 
differ from their exact values by less than 1’, and Kepler has described in a famous passage 
(Astronomia Nova..., Chap. 19; Werke, 3, 178) how he was able to calculate the elements of 
a circular orbit for Mars, differing from Tycho’s observations by s' or less, but rejected it 
because he knew that errors of 8’ could not be neglected with so diligent an observer. 
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We see that Tycho used the arithmetic mean to eliminate systematic errors. The caloula- 
tion of the mean as a more precise value than a single measurement is not far removed and 
had certainly appeared about the end of the seventeenth century, as is shown by the following 


extract from Flamsteed's discussion of the errors produced by his mural are on the right 
ascensions of stars (1725, vol. 3, p. 137): 


Rectarum Soris Adscensionum Differentia inter 14** Martii ac 15™ Septembris [of 1690] ex Obwer- 
vationibus circa Solem pro istis Diebus reperitur, viz. 


per Calcem Castors „ Ww o 
per PROCYONEM 178 36 5 
per POLLUCEM 178 36 20 
Media inter has Differentia 178 36 8 
At hanc Mediam subtrahendo a Sors Recta 
Adscensione 15” Septembris, vn ———— — ——— 1823153 
178 36 8 
remanet, eius vera Recta Adscensio 14" Martii Meridie ————————— 35545 
quae verum dat eius Locum ————— — — ——————————————' 417 7 


A third example illustrates the combination of data from different observers. During 
1736-7, a French expedition under Maupertuis was sent to Lapland in order to measure the 
length of a degree of latitude and, by comparing it with the corresponding length in France, 
to decide whether the earth was flattened at the poles, as maintained, e.g. by Newton, or 
at the equator, as held by the Cassini family. Their method of observation, as described by 
Outhier, has been summarized by Clarke (1880, p. 5) as follows: 

Each observer made his own observation of the angles and wrote them down apart, they then 
took the means of these observations for each angle: the actual readings are not given, but the 
mean 18. 

In the event, the degree proved to be longer in Lapland, and Voltaire congratulated 
Maupertuis on having flattened both the poles and the Cassinis. i 

At about this time, the calculus of discrete probability assumed an organized form, and 
the appearance of the differential calculus made extensions to continuous probability 
possible. The distribution of the arithmetic mean now began to receive the attention of 
mathematicians who were conversant with the new techniques, and a pioneer study by 
Simpson was followed by a long memoir from Lagrange. Nhe 

Tn his paper of 1755, Sena gives the probability that the mean of ! observations is at 
most m/t for the following two error distributions: 


(i) possible errors are — b, , — 2, — 1, 0, 1, ..., v and equal probabilities are attached 
to them; 

(ii) the same set of errors with probabilities proportional to 1795921, ...; 2, 1, 
respectively. 


The solution for (i), when expressed as a gaming problem, wes known 5 10 pa d 
treatment by generating functions is the same as de XS : oen i AEN W E 
generating function for (ii) is the square of what it is for (i), Simpson's ini : 2 : 
amounted mainly to realizing the physical interpretation ofa 5 unl 

What is novel in Simpson’s work appears in the four pages een el E 5 
in 1757. Here he extends the solution of the second problem to the aaa w 5 
error distribution is continuous, in the form of an isosceles eee curte cesi 
finds the probability that the mean is nearer to zero than a single indepe ; 
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Simpson's debt to de Moivre is clear and the widespread respect which The Doct 
Chances inspired during this period is notably attested by the following quotation fr 

letter written by Lagrange to Laplace on 30 December 1776. 


Il est vrai que j'ai eu autrefois l'idée de donner une traduction de l'Ouvrage de Me 
accompagnée de notes et d'additions de ma facon, et j'avais méme déjà traduit une partie de 
Ouvrage; mais j'ai depuis longtemps renoncé à ce projet, et je suis enchanté d'apprendre que y 
en avez entrepris l'exécution, persuadé qu'elle répondra à la haute idée qu'on a de tout qui sor 
votre plume. 

In the first fifty pages of his memoir, Lagrange presents a detailed discussion of diser 
error distributions, on lines essentially the same as those followed by Simpson; he again mal 
free use of generating functions, and again extends results from discrete to contim 


parameters in a multinomial distribution; and purports to show (problems 4 and 5) thatt 
mode of the distribution of sample means is the same as the population mean. The chi 
contribution of the memoir to the probability theory of the arithmetic mean occurs inj 
last twelve pages, where Lagrange gives a method of obtaining the results for continu 
distributions directly. He begins by evaluating 


quel _ (m—1)! 
o @ (log an 


where a is larger than unity. He now says that the coefficient of a?- in 
(Pa? + Qa? + Ra»? 4 ...)/ (log a)", 
is obtained on replacing 
l/loga)* by l) » 291a-da[(m — 1)! 
and is thus given by 
(Pam Q(z — 1)771 4 Rs 20) T . dx Mm 1)1. 

He next asserts that the probability element for the sum of n independent variables, eac 
with density function y(x), is the coefficient of ain [fy ; ardal", where the term ‘coefficient 
is used in the sense just defined. Several examples follow, in all of which the error distributi ; : 
has a finite range, so that m is a sum of terms like (1), and is therefore amenable t 
the processes he has described. The last error distribution is given by 


y = K cosg (Ar S qu), 
and the memoir concludes with a set 
quantities, 

At this interval of time, we can recognize the last part of Lagrange’s memoir as a startin, 
point for the theory of integral transforms, although its merits were scarcely vi 
to Todhunter, writing in 1865. However, they were at once appreciated by Laplace 
who refers to ‘la belle méthode que vous donnez’ in a letter written to Lagrange € 


of ingenious manipulations involving imagi 


1 
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11 August 1780, and who subsequently made the technique a basic part of his attack on the 
problem of combining observations. 


I am very grateful to Dr A, Fletcher for his invaluable suggestions and guidance on 
astronomical matters, and for greatly improving my translations. 
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MULTIVARIATE LINEAR STRUCTURAL RELATIONS 


Bv R. L. BROWN ax» F. FEREDAY 
British Coal Utilization Research Association, Leatherhead, Surrey 


Given n observations of m-variates having known errors, the envelope of primes, associated with a 
given probability level, is shown to be a quadric primal the nature of which determines the accept- 
bili ^ ^ : ^ 


acceptable relation if every prime through it is acceptable. 
The consequences of i definition are shown to be consistent and are contrasted with 
some loss properties of Tintner's method. The coefficients in the relation are not estimated 


1. INTRODUCTION 

For the study of the linear structural relation between two variates, Brown (1957) has 
proposed an envelope method, which yields a region bounding all relations acceptable at 
some assigned probability level. The shape of the acceptance region determines whether or 
not the evidence is in accord with the hypothesis of a linear structural relation. Thus, the 
coefficients in the relation are not examined separately, the relation being treated in toto. 
In this paper the envelope method is generalized to the case of one or more linear structural 
relations amongst m-variates. 

Let xi, xs, ..., x, ben observations of a vector variate X, which is multi-variate normally 
distributed with dispersion matrix I, the unit matrix. Each of these normalized obser- 
vations may be represented as a point P; in a flat m-dimensional space, deriving from an un- 
known true point Q; with co-ordinates X;. The simplest linear relation that can obtain for 
the true points is the primal relation, 


%+a'X = O, (1-1) 


there exists a ¢-number, such that n. 
of freedom, which can be expressed 
of the prime (1-1). Thus if N is the 


Y = AEX a NS) a8, (1:2) 


(a'a)? (a 0001 zd (a U , 
UMS 5; is the vector error corresponding to x, Let z be the random variable 6.5% U al 
&(z) — 0, 
var (z) = v i) = a ar d) a sol (1:3) 
(a a) (ae) vec 
Hence, = y («$) (48) & Sa“) d 
ix: = (ea) a à (ca) ; 
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hasa yi distribution. But 


esa. * 5 
op = E Sete t) Fu (14) 
is determined by the observations and the coefficients of the prime only. The remaining 
n (m — 1) degrees of freedom are associated with the displacements of the true points Q, 
from N,. We can now make the probability statement 
rr « $,) =p, (1-5) 
where p is some assigned level of probability. Then the numerical statement (1-5) leads by 
the envelope method ($2) to a quadrie primal envelope of acceptable primes and, by the 
confidence method (§ 4) to a confidence region for the coefficients a,, a of the prime jointly. 
Although the algebra thus far is similar to that given elsewhere, for example in the study 
of canonical correlations and principal components (Hotelling, 1933, 1936; Bartlett, 1934, 
1937, 1941), the objective here is different in that we do not assume a priori that there is 
& structural relation. Nor are we eoncerned primarily with estimation although, having 
shown the existence of a relation, we can afterwards indicate a ‘best’ relation. It is then 
not necessary to give limits for the individual coefficients in the relation and indeed these 
are interrelated (cf. Brown, 1987). 
Tintner (1945), using results due to Fisher (1938) and Hsu (1939, 1940), has shown that 
the rank of the variance-covariance matrix of the observations can be established from 
the roots ¢,, ..., ¢, of the determinantal equation (cf. equation (2-3) below) by applying a 


Xi», test to the statistic A, = (n—1) (fo 480. (1-6) 


Here rn, is r(n—m-—r+1), the degrees of freedom associated with an (m—r)-fold in m 
dimensions. This follows by noting that r stars of primes through r non-intersecting (r — 2)- 
folds determine once, and once only, every (m —)-fold in the space and that a prime through 
an (r—2)-fold has (m —r-- 1) degrees of freedom. This test establishes whether the true 
points derive from an (m —r)-fold, which case we propose to call partial relations of order r, 
there being r independent linear equations. 4 

Tintner (see § 6) appears to have defined the (m—r)-fold in terms of the perpendiculars 
from the observations.* As an alternative to this perpendiculars definition, we propose the 
rational definition that every prime through an (m —r)-fold must be acceptable if the struc- 
tural relation is partial of order r ($3). This definition is obviously in accord with the mathe- 
matical requirement that any r-independent primes define an (m — r)-fold. Taking the view 
that the confocal system of quadric primals derived from equations (1-4, 5) is to be regarded 
as a transformation into true point space of the errors, the degrees of freedom of nó are n, 
Whatever the order of the partial relation. Whereas for Tintner the latent roots $ have a 
probability distribution, the envelope method yields fixed ¢,, the probebility ae 
being associated with d, in equation (1-5). If it were felt that the coefficients of the quadrie 
primal had been ‘estimated’ from the observations, it would be appropriate to diminish 
the degrees of freedom accordingly, provided it could be shown that n could be split ses] 
independent parts, each separately having a y*-distribution. Our present view is that suc 
a procedure would be incorrect. 

* It is shown in $6 that the envelope method also may be applied to partial relations under the 
Perpendiculars definition. 
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The rational definition of partial relations is easily seen to lead to the result that the order 
of partial relations cannot be derived from consideration of the relations holding amongst 
subsets of the variates. This fact shows (§ 7) the inadequacy of confluence analysis (Frisch, 
1934; Mudgett & Frisch, 1931; Stone, 1945). 

An advantage of the envelope method is that in practical examples we may choose 
unambiguously to calculate partial relations of order r, either as the intersection of r primes 
(using for this purpose the principal primes corresponding to the r smallest roots of the 
determinantal equation), or in parametric form (using the principal axes associated 
with the largest roots). If two or more of the roots are nearly equal their use can thereby 
be avoided and the necessity of carrying a large number of figures in the numerical work 
is obviated. 

The work is limited to linear relations. It will not indicate the presence of two or more 
independent partial relations. Thus, where the true points lie on a pair of distinct lines in 
four dimensions, the test would lead either to a twofold or a threefold, according as the 
lines met or were skew. Such independent partial relations are a degenerate case of curvi- 
linear partial relations (the pair of lines being a degenerate conic or twisted cubic). There is, 
however, one simple but important case that can be settled by the present analysis. There 
may be present an element x, of the vector x that is irrelevant to the other variables; then 


x; = constant is an acceptable primal relation. Directly, however, 5 (£; —%;)?/(n — 1) is an 
t=1 


estimate of a unit variance having (n—1) degrees of freedom and therefore may be tested 


against y?_,/(n—1). It will be assumed that such ‘variable constants’ are omitted from 
the analysis. 


2. ACCEPTANCE QUADRIC 
The variance-covariance matrix U of the observations is composed of elements Uy, Where 


n 
TA m p , (21) 
=1 
the origin being at the mean of the observations. Then from (1-4) 
a. Ua 22 
aT ewe E 
and the stationary values Ži 1 = 1,2,...,m, of ꝙ are latent roots of U, i.e. of 
| U-gI| =o. (23) 


The primes corresponding to these stationary values are derived from the latent vectors 
lof U; writing L for the m x m matrix of the m latent vectors 


UL-LA; Lir (24) 


for L is an orthogonal matrix; A is the diagonal matrix with elements G.. 
For any chosen significance level of 9, the prime (1-1) is constrained by the relationship 


(2.2). The envelope of all such primes is found by eliminating the coefficients from the 
equations 


Meat aytiy aah) Ad ο + a X;) ( 
7 1 A 


S Qa, 8 = 0,...,m; i, j, k summed double suffix! to m), 
(25) 
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where A is a constant factor. If U$ is the cofactor of the (th element in | U-91|, the 
envelope is easily seen to be the quadric primal 


| U-41|+U%X,X, = 0, (2:6) 
which we may rewrite in matrix form as 
1-X'(U—-óI)? X =0, (2:7) 


As shown above, we have an orthogonal matrix L formed by the latent vectors of U; 
we transform to new co-ordinates Y, where 


Y = LX. (2-8) 
Now (L’X)' (U- 9L)? (L'X) = X'L(U-óI) L'X 
= X'(L(U-óI) L')]- X. 
But LUL'- 4A, LIL' I. 
Hence the envelope in Y-co-ordinates is 
1- Y(A—óI)3Y - 0. (2-9) 


But (A — I) is a diagonal matrix with elements (9;— g); hence (A — un is a diagonal 
matrix and we have for the envelope 


$425 41 0, (2:10) 


EXE 
Obviously the principal primes of the quadrie primal (2-10) are the co-ordinate primes 
Y, and also the primes (ef. (2-3), (2-4)) corresponding to the stationary values Sr of G. The 
Y-co-ordinates are a canonical set such that co-variances of the observations are zero. 
For different ø the quadrics are confocal. ‘ 
It is well known that through any point 7 there are m confocals (for which ¢ takes the 
values % and that the tangent primes 7; to these confocals are mutually orthogonal; thus 


m 2 


T Ab da 
II 12009 % e 
"A TAEA Eo 2 (Sr i (9 — 15) x1 (9, t) 
Also we have the inequalities 
$1 <b, < Qa «ts... hn «t, c, 
associated with the latent roots Øp, arranged in ascending order of magnitude, of the 


dispersion matrix U of the observations. 
Any prime through Y may be written, 


(2:12) 


fano (213) 
i=1 
and the corresponding statistic ¢ is easily seen, using equations (2-11), to be 
m 
HU) 
PL ip (2:14) 


9 n 
3 Av; 
i=1 
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so that ¢ is stationary when the prime through 7' is one of the tangent primes z,. It is an 
immediate corollary that the primes through the intersection of the primes +,,7,, "o 
have stationary values of ¢ when they coincide respectively with 7,,7,,...,7,, and the 
stationary values are i, fys ..., ty. 


3. PRIMAL AND PARTIAL RELATIONS 


For any chosen significance level of à, say h, associated with probability level p, we prove 
that the necessary and sufficient conditions for a partial relation of order r to obtain between 


the variates are peer ee in) 


Take any point 7' on the acceptance quadric (2-10) for which ¢ is Ép- Then ¢,, is one of the 
parameters f, that serve to define the m confocals through T'. Let Hy be t,. Then by equation 
(2:12) there are (r — 1) smaller values of f;. Let M, be an (m — r)-fold tangent to the acceptance 
quadric at T' and formed by the intersection of the r tangent primes 7, ... 7,. Then every 
prime through M, determines a statistic ¢ that has stationary values when the prime coin- 
cides with 7,79, ..., 7,, respectively, and therefore has a maximum at t, = S Thus M, isan 
acceptable (m—r)-fold. If ¢, = ¢,, the (m —r)-fold Y, . . Y, is just acceptable. It follows 
from equation (2-12) that (3-1) are necessary conditions if there are to be acceptable (m —r)- 
folds tangent to the acceptance quadric Hy: Obviously (cf. equations (4-4, 5) below) every 
(m — r)-fold parallel to M, and nearer to the origin is such that the maximum value of 9, for 
all primes through it, is less than 9: if the (m —7)-fold is further than M, from the origin 
there will be some prime through it for which $ exceeds ¢,,. Thus the acceptance quadrie 
$, bounds the acceptable (m .— r)-folds of type M.. 

Through the point 7 on the acceptance quadric ¢,, there are acceptable (m—r)-folds 
tangent to the quadrie and not coinciding with M,. Tt is not, however, essential to classify 
them, since, for the partial relations to be of order r at least, it is enough to show that there 
is at least one acceptable (m—r)-fold. It is geometrically intuitive that we may rotate M, 
about the point 7’ and in the tangent prime 7, as far as the generating surface through T 
without causing the (m—r)-fold to cease to be acceptable, the generating surface being the 
generalization of the pair of generating lines through any point of a quadric in three dimen- 
sions. In other words, we may rotate M, until it passes outside the quadric (see $5). Once 
we pass through the generating surface, however, the (m —r)-folds become unacceptable. 
Analogously, in two dimensions, there is a line of ‘worst’ fit as well as one of ‘best’ fit. 
Finally, the acceptance quadrie Ép bounds all acceptable (m — r)-folds. 

It remains to show that the conditions (3-1) are sufficient. Let M, 44 be the (m—r— 1)-fold 
through the origin formed by the intersection of the r co-ordinate primes I, Y, emu 
These are stationary primes and hence there is one prime through . for which ¢ is jj» 
which exceeds p. Hence M. I is not acceptable. For any other (m — r — 1)-fold containing 
the origin, the maximum value of ¢ for primes through it will exceed ¢,,,. For any (-= 
fold not through the origin, the maximum value of ¢ for primes through it exceeds that for 
a parallel (m — r — 1)-fold containing the origin. Hence no (m —r — 1)-fold is acceptable. 

For $9, <1, the acceptance quadrie is imaginary and there is no acceptable linear 
relation. For ¢, = ¢, the co-ordinate prime Y, is just acceptable; the acceptance quadri 
becomes a focal quadrie in the prime Y, = 0. For ¢, < $y < Žo a prime is acceptable, a?! 
these are bounded by the acceptance quadrie 9, which is a generalized hyperboloid. For 
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$51 € 9, € Pm» 8 line is acceptable, the equality obtaining for the co-ordinate axis formed 
by the intersection of the co-ordinate primes Y, Y, . Y, ,. For, = ¢,,no relation obtains, 
as is also the case for g, > m, when the acceptance quadric becomes a generalized ellipsoid : 
it could be said that the relation becomes indeterminate in direction and that only points 
lying inside the ellipsoid are acceptable. 

We have shown that a rational mathematical definition of a partial relation of order r— 
namely, that every prime through an (m — r)-fold must be acceptable, leads to a simple 
and self-consistent statistical test as given in equation (3:1), and also to a useful geometrical 
picture for the location of acceptable (m — r)-folds in relation to the acceptance quadric gh. 


4. CONFIDENCE QUADRIC 
As has been shown previously (Brown, 1957), we may also derive a joint confidence region 
for the coefficients defining a prime. In Y-co-ordinates, let the prime be 
f, B'Y = 0. (4:1) 
From equation (2-4), the variance-covariance matrix of the observations is, after trans- 
forming to Y -co-ordinates, the diagonal matrix A and we have 


. È 6A 
= — —. (4:2) 
5E 
k=l 
Then for an assigned ¢,, we may write the confidence quadric as 
f+ X, (9-6) =0 (2) 


in homogeneous co-ordinates (fp, Hi; ..., Êm). This quadric has the same principal primes 
as the acceptance quadric. If all G exceed øp, this quadric is imaginary and there are no 
suitable primes. i 

If (Ao By; ..., Bm) is a point in confidence space corresponding to any general prime and 
(Ps. Ba» s, m) is a parallel prime for which the associated point in homogeneous coefficient. 
space lies on (4-3), then 


fe = fi lo-) ER (4-4) 


where ¢ is associated with the general prime. Thus 
lA lA] as e Sg. (4-5) 
yield acceptable primes, and vice versa. 


i ints lying ‘inside’ (4:3) 
and we can say that all points lying change to the ratios of the (m+ 1) para- 


But further interpretation depends on how we t ( pars 
meters J, s = 0, HW. For $4 «à, € Qs € --- <$m and f, ES (4:3) is a generalized ellipsoid 
and the confidence region is bounded. Putting unity for any other f yields a gene 
hyperboloid; moreover, in terms of the coefficients « in X-space the quadric is 5 
non-central. Although both the acceptance quadrio and the a pune ( ua 
treated) yield the same answer, the complications ensuing from (4-3) do not appear 


Worth pursuing. 
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5. THREE DIMENSIONS 
It appears worthwhile to give in more detail the results obtaining in three dimensi 
in particular, to prove the fundamental property of the acceptance quadrio 
suggested in $3, namely, that every line lying wholly inside the quadrio ø, is 
at probability level p, and conversely. 
It is clear that the two generating lines through any point on the acceptance q 
have a special property. One system of generators is given by 


Ege Y, 
Ve, i) Vb A ( . 600 : 
ce -- 2 106-76. 0 
A(ó, m $)U Js d. $,) A 5 3 $2) 
from which it follows easily that every plane through the generator has ¢ = $,. To 


write down the value of g for any plane through the generator in the form cA + 4B, 
A and B are the planes of (5-1). It will be found that both c and y disappear. Followi 


planes through it is Gy · As this line is rotated in the tangent plane, it passes at the ge 
line through a state where all planes through it have ¢ equal to Ép. Then on further 

becomes a minimum value of $, not a maximum, so that the line ceases to be ac 
Clearly, in passing through the generating line, we pass from the situation for which 
point on the tangent line is inside the quadric to one where the tangent lies outsi 


Y=b+hu, I bzw, If bs 
shall be inside the quadric is 


ithu)? Chu | (blu)? 
LL $ M + LE <1, for all v. 


Hence we must have 


ESQ4. 4 HD qw 
E ue o Ra . 
n 7 four E RC 
(one of these implies the other) and if 


„ 


-————4-.—33. 3 

9 91 o-ha Fg 
then HUC 

Bi ute 
Which may be rewritten as, 


(699 (6, ge) , ge (g 9)" G i e) P 
where m, = (lbs —,b,), ete. 
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We have now to write down the conditions that every plane through the line shall have 
$«$,. If the plane 


LLLI ISTE (5-5) 
is to contain the line (5-2), then 
6b + eub, eb +1 e se 
GhtGh+al — -0 
and we may obtain a pencil of planes A by taking* 
e 66S À = 0, (5:10) 
Then e, = — 
where b b. b, 
4=|4 h t |- memen w-h-h ete. (511) 
BN M 


All planes are acceptable if 

| -E lfp) mi -X(6,-9)m, 
be, ehm, Pe- 
and Eip,- d.) m> 0. (5-13) 
It is also implied that X — 9) n?» A*. Remembering that 1 („ ,) is negative, (5-7) 
and the first of (5:4) show that (5-13) is valid. Now (5-12) gives 


($y — $1) (Gp — $2) (mini — mini) + two similar terms, 
> A*{(¢, i) mi +similar terms}, 


and it is a well-known property of determinants that (mjnj— mini) HA, eto. Thus 
(5-7), (5-12) are the same and the proposition is proved. 

Lastly, we note that the subsets of two variates, say cn have a determinantal equation 
with roots øi, $3 such that "n died cdi be (5-14) 


It is easy to write down 5, $i, and substitute in the cubic giving $i, $s $, and the proof 
fillows, A general proof he from a result given by Turnbull & Aitken (1932). If we have 
a line in three dimensions, 91 < $ < Żp and therefore we also have a line in each of the two 
dimensions formed by omitting each variate in turn, for nó, has a x5 distribution whatever 
m may be. The converse is not true. These two findings are to be expected for a truly 


structural relation. 


* In four dimensions we should have also a fourth equation in which the signs of c, are alternatively 
E = minus, " s . 2 2 
T But from the perpendiculars definition (cf. $6) a line in =. es ^ elt 
( .) «32, and a line in two dimensions requires n X»- Since yi. «2ys 
($+) « 23 and it does not follow that nd, < xi x 
is not necessarily a line in two dimensions. For this reason 
definition is less satisfactory than the rational definition. 


v (512) 
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6. ENVELOPE METHOD APPLIED TO PERPENDICULARS DEFINITION 
As already noted, another definition of an acceptable (m—r)-fold can be based ¢ 
requirement that the sum of the squares of the perpendiculars on to the (m—r)-fold 
give an estimate of the error variance in accord with the appropriate independent com 
variance derived from the known errors of measurement. This perpendiculars defit 
leads to Tintner's test for a statistic / having a &. distribution (cf. equation 1-6). We 
show that neither definition leads to result derivable from the other, but that, in pra 
the rational definition will usually lead to a smaller area for acceptable lines (in three dij 
sions) than does the perpendiculars definition. For simplicity the treatment is confini 
three dimensions. 
Our condition ¢,<¢,<¢, implies (¢,+¢,)<2¢,. Now for the interesting rang 
probability p exceeding 0-75, X, < An, so that / <2¢,.* Thus, we cannot deduce 
($1 94) < Yp. Conversely, the perpendiculars definition tells us that for the plane, ¢ 
and for the line (51 . G) < . Both relations obtain if a line is to be accepted; but it 
not follow that ¢,<¢,. It is, however, possible that for some lines there is a relation betw 
the associated ø and yy that would enable a probability level p to be chosen so that on 
implied the other. We show this is not the case. Consider the line in Y-space 


Y=apthu; HTA -I. apeabeap-id, dih T daa L dsa = 0 
and take (Li La La) to be a unit vector orthogonal to (4,2505) and to (1,1,/,). Then any p 
containing the line is 
(a5 + AL) Y, () Yo + (aA Y; = p 
and the statistic ¢ for the pencil of planes through the line has a maximum value Øm 
29m = V+ {p-p (Lip + 139, LAS - 039,9, - 030,6, - 82.02). 
where y= -W $+ (1—B) d+ (1-28) by +p. 


The perpendiculars definition yields the statistic V. Hence 24,,> y. But x2, < 2x3, a d 
result follows. 

To gain a closer insight into the relationship of the two tests, we may find the envel 
of lines (6-1) having a constant yr. Since yr is independent of the direction (di] of 
perpendicular from the origin on to the line, equivalently we find the envelope of the cylin 
of lines distant p from the origin and in the direction (L lal). This is easily seen to be qua 


Y: wal vr A Y: 
BiAhrA-'PGYAi4A-Qy ET g yT 
where R = (Y}+ T2 T3). 


This surface is real only if (S1 ＋ Ge) y and i = „ 
W itp) « V (2+ Ga) » V, the sign of (d, 4- 9, — ) bt 
In practice, it is usually the case that 4,>¢, 


: s and therefore near the origin (meat 
observations) we can find an approximation to the quartic. A simpler result, sufficie 


similar to the foregoing approximation for our present purpose, follows by considering 


* Tintner gives /, a 31, 4/(n— 1) distribution for an * esti : : nato 
5 =, an ‘estimated’ line; hi h t estimated 
line and so take y3,,/n (cf. discussion of degrees of freedom in Int de. we have no 
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special case d, = y when the quartic becomes the quadric of revolution 


+ 
TN NE ^ 


The corresponding quadrie obtained from the rational definition is 


Fe wt, (6-7) 


Both quadrios are hyperboloids of one sheet. For g, g, we can put y- — 26, * Aa“, 
6 = 4, - =b, d, — ø +b, and compare the hyperbolas (with Y* = Nr 


where A is less than 2, but usually near it. We see from Fig. 1 that the rational definition 
limits the region of acceptable lines more than does the perpendiculars definition. 


Y «(Yi «vii 


Uf / 


KKK 


E 


m 


YY 


Rational definition 
ZZ * Perpendiculars definition 


Fig. 1. Approximate comparison of envelopes of acceptable lines in three dimensions. 


Both definitions lead to statistically valid results and each lid 3 3 
use. But the perpendiculars definition does not ensure that every p ug 


to bea requirement when we 
lines is itself acceptable. Since the latter appears . desque "a m re 


are considering the presence or absence of structural Biom. 45 


10 
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rational definition should be favoured. Fortunately the ‘best’ relations arising from the 
two definitions are the same. 
7. EXAMPLES 
7-1. Introduction 


In order to illustrate the foregoing and to compare the different methods of multivariate 


analysis, three examples have been constructed, each of twenty-five observations in four 
dimensions. 


The equations used were: 
Example I—a plane Example II—a prime Example III—a line 
Z, = 3u-- 2v Z, = 2u+2v+ w Z, = 5u 
Z u+3v Z = u+2v+2Qw Z = 6 
Z, = 5u— v Z; = 3u— v+2w 2. = Tu 
Z. Ju v Z. = 5e 2 Z. = 4u 


u, v, w are running co-ordinates. 

From a table of random numbers, twenty-five values were given to each of u, v, w, dif- 
ferent sets being used for each example. In this way, sets of twenty-five ‘true’ points were 
obtained. To these ‘true’ points were added random normal deviates (obtained from the 
Biometrika tables) after testing these deviates for any chance significant deviation from 


normality. The final ‘observed’ points are given in the Appendix, together with the standard 
deviations of the added error terms. 


7-2. Comparison of rational and perpendiculars definitions 


From the data thus obtained, the stationary values of ¢ were calculated and these are 
shown in Table 1. 


Table 1. Stationary values of ó 


Roots ¢; 


Examination of these roots according to the ‘rational definition’ involves testing the 


roots individually using 25 degrees of freedom. In this case the critical values of ¢ are 1-50 
for the 0-05 level and 1-77 for the 0-01 levels. 


Table 2. Significance of roots according to the ‘rational definition’ 


Significance of roots 


1:362 5˙15˙⁹ 54.99% 
2.213% 18.60% 18 
1-010 1-417 140-6*** 
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In Table 2 and the following tables, one asterisk denotes the significance level 0-05, two 
asterisks the 0-02 level and three asterisks the 0-01 level. 

The rational test reveals at once the nature of the relationships. 

If, however, we examine the roots according to the ‘perpendiculars’ method (Tintners 
theory) we find the following table: 


Table 3. Significance test for ' perpendiculars’ method 


This leads to exactly the same conclusion as Table 2. It has been explained in an earlier 
section that in many cases the conclusions will be the same, but that cases may arise in 
which there are differences, as the ‘rational’ definition is more exacting than the 'per- 
pendiculars’ definition. 


7-3. Determination of the ‘best’ relation from the rational definition 

All calculations relating to the determination of the best relationship and the testing of 
theoretical relationships should be done in the ‘normalized’ co-ordinates, used throughout 
in the theory. Thus, if Z; represents the standard current co-ordinates, then zy, the normal- 
ized co-ordinates, are given by Z; = n, where c is the standard deviation of the error 
in Z,. To determine the ‘best’ relation, the direction cosines of the principal axes of the 
acceptance quadrie are computed. These are the principal minors of the matrix U- 
with ¢ given in turn the values d Žas Ps; Pa» aS in Table 4. 


Table 4. Direction ratios of principal axes 


If the relationship has been shown to be primal, only the first row is needed and leads 
de latı thata lr hats = 0 
as the ‘best’ relationship. 
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Equally, of course, this prime could have been obtained as that defined by the remaining 
three axes of the quadric, but this involves more computation. 

On the other hand, if the relationship is a line, then it is defined by the intersection of the 
first three principal primes of the quadric or by the remaining axis of the quadric. This last 
definition involves very little computation and leads at once to 


Where the relationship is a plane, it can be defined either by the first two primes or by 
the last two principal axes of the quadric. The computation is the same in each case and 
there is little to choose unless either the first two or last two roots of the matrix are near 
together, in which case the use of the spaced roots will avoid the necessity of carrying a 
large number of figures. 

In Example I above the direction ratios are 


Table 5. Principal axes, Example 1 


1 2 3 4 
61 +63-44 +37-99 — 45-64 — 22-76 
62 — 26-84 +41-48 + 619 — 22-94 
bs + 4:33 — 87-09 4-12-23 — 158-77 
ba —34-49 —12-65 —59-14 + 1-44 


The relationship has been shown to be a plane and the ‘best’ plane is therefore defined by 
63-442, + 37-992, — 45-642, — 22-762, = 0, 
— 26842, +41-48x,+ 6-192, — 22-947, = 0, 

The remaining two rows need not have been computed. After a little rearranging these 


become 
2, — 0-582, — 0-022, = 0, 


y — 0-232, — 0-562, = 0. 
The ‘true equations in Example I were, after changing to co-ordinates referred to the mean 
as origin, 


% — 0:612, — 0-122, — 0-28 = 0, (11) 
Za — 0:272, — 0-60, — 0-21 = 2l 


Showing good agreement with the estimated ‘best’ relationship, 


7-4, Test of theoretical relation under rational definition 
To test a theoretical relationship, either the value of É may be computed (in the case of 
à prime) or the reality of the intersections with the acceptance quadrie may be investigated 
(all cases), In the case of a prime or line, neither of these present any difficulty but in 
intermediate cases, it is not quite so simple. We use again Example I (the plane (7:1) in 
four dimensions) to illustrate the technique. 
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Computation is simplified by changing the axes to the principal axes of the quadrie and 
this is easily effected by a transformation of the type 
2, = (ly y, + Ie ath lan0 Hk +8)’ 


where /,, are given in Table 4. 
This transformation of (7-1) leads to the new equations 


79-94y, — 38-65y, + 7-05, + 1-25y, — 21-56 = 0, 
45-06y, + 62-08y, + 1-88y, + 1-99, — 25-47 = 0. 
We may now make use of condition (5-12) above for every prime through this plane to be 


acceptable. 
4808-19 1545-07 
This leads to Va 
1545-07 1764-00 


These conditions are plainly satisfied and the plane is therefore acceptable. 


»0, 480419» 0. 


7-5. Examination of subsets 
Finally, it is interesting to examine the subsets amongst the variates. Table 6 gives roots 
of the determinantal equations obtained when each variate in turn is omitted, so that we 
treat the experimental data as if we had only the measurements of three variates from 
amongst the four which satisfy the structural relation. It will be noted that the roots fall 
between those (Table 1) of the four variate determinantal equation. 


Table 6. Roots of determinantal equation for subsets among three variates 


Example OE 
a 174 54-46** 
I omitting 2, Td 1584** 
dd 4-210** 52-75** 
2 .]33** 41:39** 
Es 5:133 
m: .G0** 19:85** 
II omitting 2, PO 850 19-28** 
E] 394** 18-05** 
Ta 17-04** 19-61** 
wy 
Ae P 128-4** 
III omitting z, n 128.3** 
Xs 1-225 53:50** 
7 1-020 112-6** 
1 


Asterisks: *, 0-05; **, 0-01; levels of significance. 


i dimensions yields a plane in each 
In Example I we find that the plane (twofold) in four l 3 i plan 
of the three dimensions. In Example III the line in four dimensions yields a line in a s 
the three dimensions. But in Example II we find in three cases no relation ini wo pe 
expected) and in one case that a plane would be acceptable. Although this must be regarde 
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as accidental it serves to illustrate the dangers of attempting to find the nature of a relation 
amongst m-variates from studies of the relationships between subsets of these variates, 


7:6. Qualitative methods of analysis 

If we knew nothing of the errors to which the measurements were subjected, we could 
calculate the partial correlation coefficients. It was found that in Example I, the zero, 
first- and second-order correlations between x, and x, were significant at a probability level 
0-001; in Example II, zero, first- and second-order correlations between x, and z, are also 
zero, first- and second-order between x, and 7, were significant at 0-001; in Example III 
all zero order correlations were significant at level 0-02. Evidently Example III gives a clear 
pattern suggesting a connexion of each variate with each of the others. For the other 
examples the pattern is irregular and it is difficult to drawn clear-cut conclusions. 

The multiple correlation coefficients were also calculated. These may be regarded as 
giving a measure of the efficiency of regression equations of one variate on one or more of 
the others. In nearly every case the coefficients proved to be significant, suggesting that 
for each example there is an underlying relationship. Except in the case of Example III, 
where every coefficient was significant at a probability level of 0:001, there were non- 
significant multiple correlations, so that again it was difficult to draw clear-cut conclusions. 

The method of confluence analysis has also been applied to these data. Here we compare 
two of the multiple regressions, for example that of x, on z, æ, with that of Lg ON 2, Tye 
Rearranging the regression coefficients, their scatter or proximity indicates the likelihood 
of there being present a true relation. The comparison is qualitative and the procedure is to 
plot all the possible combinations. It was found difficult to make any clear deductions in 
the cases of Examples I and II, but again Example III showed that all the pairs of zero 


order regressions were markedly similar, indicating a strong underlying connexion between 
the variates, 


The authors are grateful to Mr W. D. Ray who kindly checked the paper. 
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€ 0 -1* Ot i» 2 t$ - 


10 9-29 
11 23-00 
12 47-78 
13 — 5-96 
14 23-94 
15 17-25 
16 28-42 
17 0-06 
18 37:33 
19 18-84 
20 47:45 
21 16-71 
22 49:96 
23 35:48 
24 20-40 
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(b) Example II 
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22 
23 
24 
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MULTIVARIATE RATIO ESTIMATION FOR FINITE POPULATIO 


By INGRAM OLKIN 
University of Chicago and Michigan State University 
l. INTRODUCTION AND SUMMARY 
In sample surveys precision in estimating the unknown mean F of a finite population 
be increased by using an auxiliary variable X, which is correlated with Y, and whose 
X is known. Two such estimates are ratio and regression estimates. This paper is con 
with the extension of ratio estimation to the case where multi-auxiliary variables are 
to increase precision. 
In the univariate case a simple random sample (i. y,), ..., (z,, Yn) from a finite pop 

(Xu Y;), ..., (Xy, Yy) is observed. The mean X is known, and Y is to be estimated. 
estimator 


j= P fan 
is called the ratio estimate of Y. In general j is biased, and for large n, approximations 
Ej and V(j) are given by wee 
Ej = To (62. — 62V); 
* N-nY: 
V(g) T 7 C (Cre + Cyy — 207% 


where c, = S_,/X*, c,, = S. Fa, Czy = S,,/X Y, and S, is the covariance between X 
Y (Cochran, 1953, pp. 115-16). Hartley & Ross (1954) have shown that 
* 2:yQ4U -1n(g-rz) 
N TX GE Ctm. 
is an unbiased estimator of Y, where n? = Ly,/a, (x; is assumed to be positive). 
It is easily shown that jj is a consistent estimator of Y in the sense of Cochran (1 


p. 13), ie. f> Y asn— N, and also in the sense of Hansen, Hurwitz & Madow (1953, p. 74 
i.e. plim j = Y with the restrictions: (i) as n increases, N increases with n < 0N, 0«0« 


n 
and (ii) F remains constant as N increases. 
In the multivariate extension we have the following model. Population: 


Y,,....¥y, Y unknown, 
Xiv- Xin, XI O known, R, = YE 
XysXyy X, 30 known, E, = Y/X,, 


and the (p+ 1) x (p--1) covariance matrix S is known. The subscripts 0, 1, ..., p, refer é 
Y, X,, ..., Xp, respectively; e.g. po, is the correlation between Y and X,. Higher momen! 


t This work was supported by the Office of Ordnance Research U.S 
j i U.S. Army, and the Office of Navaf 
Research and was carried out while the author was on leave from Michigan State University. 
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will have superscripts referring to the variables and subscripts to the powers, e.g. 
rij = GAK. -A. -T. 


M- G. - Y Xa -A. Xe. 


Finally, Sy N — 1) denotes the covariance and = N, X, the coefficient of variation. 
The later development will be considerably simplified if we have a notation for momenta 
divided by means, thus off m PUX XI ote. 

A simple random sample (% «+. 2p) (J  1,...,m), from the population is observed. 
The proposed ratio estimate of F is 

ger Xeon X, (11) 
where w = (Wy, ..., ), Dw, = 1, is a weighting function, and r; = 9/2,. 

As in the univariate case j is biased in general, and a large sample approximation for the 
mean, variance, and mean square error to O(n-*) is given in $2. Because of the complicated 
form of the terms of O(n-*) and their dubious value, only terms of O(n-!) will be considered. 
The Hartley-Ross estimator can be generalized so that 


= 
„-F, Ne- 


* 
is an unbiased estimator of Y, where nf, = Eyle 


Consistency in the multivariate case (both senses) follows from the fact that we have a 
linear combination of consistent estimates. 

In §3, an ‘optimal’ weight function is considered, namely, that w which minimizes the 
variance. Some special examples are given. An estimate of V(g) is given in $4, and in §5 
comparisons between mean estimation using simple random sampling and ratio estimation, 
and between univariate and multivariate ratio estimation are made, An example is dis- 
cussed in § 6, where the population consists of the number of inhabitants in 200 large cities 
in 1930 (the five largest are excluded), and a sample of size 50 is taken. Here Y, X, and X, 
are the 1950, 1940 and 1930 mean number of inhabitants. 

Even in the univariate case, if the population is stratified, several different ratio estimates 
may be constructed. Two such are (i) separate ratio estimate, and (ii) combined ratio 
estimate. Generalizations of (i) and (ii) are treated in $7, and in $8 asymptotic normality 
is discussed. 

2. MEAN AND VARIANCE 


From (1-1) we have Ej = YXu,E(r|R,), 
V(g) = Y2Ew,w,cov (r, rj) Ry Rr. 

In order to obtain approximations for Er; and cov (r,, rj), we employ the usual delta method. 

p ey (-K / X. e m6 = G- XIX 


i20,,.,p;j—L...,n. If [e| «i = 1, ...,p, then 
y Filte) _ paseyü-e*d-.-) 
RT (6 — 6) (1 — 6/69) + (Coed — 60 + (el eie 


BI Lat Hr. yi +ð] = 
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Remark. If the x; are positive, then er | <1; a detailed argument by Koop is given in 
(Sukhatme, 1954, p. 141). 
The following computations are easily made (e.g. see Sukhatme (1954)). 
He; = 0, (2:24) 


N-n 


nEe;e; = NET wii, (2:2b) 


(Nm) (N —2n) , 


Neef -N-A wik, (2.20 


(N —n) (N? N- EN 6n2) high 
(W—1)(N—2)(N—3) ^ 


QI IQ De, n- Dlotioff tootoo. (94) 


n* Ee, 6;6;€, = 


+ 


We note that the covariance matrix 


N-nC 
(Eq) , er) (e. . ey) = Nx | (23) 
where € = (¢;;):(p+1)x(p+1), €; = S/ X; X; = wii 
is assumed to be positive definite. 
2-1. Approximation to 0(n-2) 
Using (2-1), (2-2), and collecting terms to 0(n-2), we obtain 
1N-n l1[(N—n)(N—2 
E(rJR,) = 7 1 (v4 off) + " p EL 9 11 — a (oR — e$) 
3N(N-2)(N-n-1) , , o 
(N 1) (N -2) (N—3) ö — off) 
=1+b,/n+a,|n?, (24) 
cov /R, N, = Ha (, + f; 4- Y) + Ex, 4- y) 4- Ef,f,— Ef EB, 
1N- 
= Na otto + off 
1 ((N —n) (N — 2n 
+3 [OF DOT a, Dub ol) (of ff - M + of 
+ 2(oi + of — off)? — 20 Cf + 0$ — wti) CHC — ot) 
N-n}? = à 
Med] Fy l- (o + 2 — ut] 
e a;[n- b;[n?. (25) 


If we define the vectors b= (bi, 


by, a= (a , d), and matrices A = (ty) 
B = (bi): p x p, then à p) 


Ej = PF y= 4 7 +0(n-*) (2:0) 
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y: 
va) = „ (4*2) vaOn3) . (2-7) 
2 1 
M(j) = A= aes eren. 8 (2-8) 


In the univariate case, Cochran (1940) has investigated the effects of the terms of O(n-*) 
and concludes that unless n is too small, the approximation to 0(n-*) may be considered as 
adequate. Because of this and for the sake of simplicity, we will use the approximations to 
0(n-!). It should be pointed out, however, that parts of the development can very easily 
be duplicated with the correspondence A + (B -- b/b)/n instead of A in the M(j), A+ Bjn 
for A in the V(j) and h ain for b in Eğ. We further note that 

N-n 


b; N (ci — Poi Co); 


N-n 
Oy = Sy (od — Hot coc: Poj Cots HH /. 


The matrix A is the covariance matrix of (e- €; .-.,)—€,) and is equal to 707”, where C 
is defined in (2-2) and 


1 -1 0 0 
— 0 

7 1 0 1 3 
1 0 Oa E 


Clearly, A is at least positive semi-definite. Since T : p x (p+ 1) is of rank p and C is positive 
definite, it follows that A is positive definite. 


3. CHOICE OF A WEIGHT FUNCTION 


The criterion for optimality of the weight vector w = (Wy, .. + Wp) with Zw, = lis to minimize 
V(j). To obtain the extremum, we make use of the generalized Cauchy inequality. 


LEMMA. (zy? < (z Ma) L-). 
where M is a symmetric positive definite matrix. The equality holds if and only if zM = 6y, 
where 0+0 is a scalar. M=A 
To apply the lemma, let e = (1, ..., 1) and make the correspondence = w, y = ¢ A = A- 
Thus de (we')? S (wAw") (Ale), 


and the equality is achieved if and only if wA = e or w = GAA. By the restriction we’ = 1, 


it follows that 0 = 1/(eA—te’), and hence the optimum w is given by 


ba oa (31) 
% Ae 
Insertion of @ in (2-6) and (2-7) yields 
=, Ted 3.2 
20 T „ ay (22) 
> el 3:3 
VG) = ase" p 
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The bias is eliminated if e A 4 = 0. This will hold if b = 0, i.e. 
C= Pom or Y= XipoS/S; (i= l, .. P), 


which occurs when each regression taken individually is through the origin. Except 
for certain special cases, the expression A- — 0 is not amenable to a simple 
interpretation. 

The weights will be uniform if and only if the column sums of A are equal, i.e. ed = ek, 
where k+ Oisascalar. (k = O implies that A is singular.) HenceeA-! = e/k and eA-1e' = plk, 
so that @ = ep. We also have that Ej = Y + Yeb'/np, V(g) = Y?k[np. 

An example which results in uniform weighting is given by 

1 ==., Por = --- = Pon = Po Pig = pij). 
Then ar = (N-n) (cà — c? — 200 / V, 
aij = (N —n) (d$ Nh +pe*)/N, 
b; = (N-n) (er Hoc / V, 


and A Nu F 
Ej = TIN (c* — pocot), 


s Nu) Y: 
Vig) = Oe Sea —P) + plc Ae +pc?)]. (3-4) 


If in addition, c, = c, Po = p, then 


„ „ 
Ly = F F 1 (1-0), 


yg) - 9-9 


2, 0-9)». 600 


More generally, the row sums of A are equal if 


p 
a Pii ueber (t — 1,..., p), 


are equal. 


4. ESTIMATE or Vos. (9) 
In the univariate case an estimate of V(7) is obtained by first noting that 
N 
10% -x, 
n N-1 


VQ) = 


1 D (y; E rz; 


which suggests the estimate v(g) = 


n n=l 
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For the multivariate extension we rewrite the matrix A conformable to the above. Ignoring 
the finite population correction (N —n)/N, 


IE X 5 
y= cb amt * r 
Aiz = ei — DoiCoC, — PojCoC; + Pag Oey y: YX, yu, 4, 
1 , 
= (Won pU -N TH- X3Y- NX Y)- R(Y Xql- NX, Y) 
E. ENR... X)] 
N 
1 GR. Xe HAK 


EI FI : 


Similarly, we can rewrite b as 


b, = et Poito: = i TX, ty ym | EH 
S Titu) (y — 774) 


— = lijs 


S Sy 0-H 


Thus we estimate Vea, by T 


b MA Titu) 
(n—1) 


This permits V(g) = ¥2/n(eA—e’) to be estimated by v(ğ) = „habe Ane. Similarly, we may 
estimate ô if A is unknown. In general, these estimates will be biased, the bias being of 
0(n1) (Cochran, 1953, p. 119). If the c; are small, then the bias is negligible. 


Tah, by 


=b; 


5. COMPARISON OF PROCEDURES 
5-1. Mean estimation using simple random sampling 
Tt is known (Cochran, 1953), that univariate ratio estimation is superior (in the sense of 
smaller variance) to mean estimation, provided 
ey Pu? 
Cy 
In particular, if c, = Cy, Pry > $ yields superiority of the ratio estimate. If c; = €, poi = Po: 
Pi; = p(i+j), the pertinent variances (omitting f.p.c.) from (3:4) are 
PYE 
Vy) = Pee 
Vg) = E [c*(1 — 9) eh 3pcoc- pel. 
This leads to the criterion that the ratio estimate is superior to simple random sampling if 


— AM 00 è 
1+(p—l)pe¢ i 
If in addition py = p and en = c, the criterion simplifies to 


P 
PT Epl 
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5-2. Univariate versus multivariate estimation 
In this section we are concerned with the consequences of using the sets of au 
variables z,, ny OF ,, Ly spy -++ Tg The result is given in the following theorem 
Tuxonkxu. Let V(j | p)| and V(j|p,q) denote the variances of jj based on the auxili 
7, -Tp and , %, J p. With optimum allocation ®, ODF. q). 
Proof. With weighting iv, we have 
Y! 1 E 1 j| 


Vulpes Vane. i. 4 
pun Arie : "eA ut t 


We must show that Age Sede. The vector e = (1, , I) should have an indicator p 
or q to denote dimensionality, but is omitted for simplicity. From 


SIL Aj! - DFD' r^ 
2 —FD' j bi 
where F = (C— BA;1 B')jA, D = Aj! B', we have after simplification 


The latter follows since F is positive definite so that Fu S 0 for all u. 
In the special case previously considered, the difference in the variance is 


Vp) V(j|p.o) = Deu —p) (=). 


| 
-1 , — , D M , 
edge! —eAzle CU Y e“ >0. 
6. AN EXAMPLE 


with Y = 1950, X, = 1940, X= 


ue of Y. We now compare the various estimates. 


(i i 2088f, where f? = (N- / nN. 
(ii) Ratio estimate, one auxiliary variate 


„ „1896 "s 
J = rX = 169g 1482 1660, o(j)- 2s9f. 


(iii) Ratio estimate, two auxiliary variates with 


(a) true weights — j— 2r, X s Y, = 1681, o(9) = 277; | 


(b) estimated weights j = 238r, X, — 1-387, X, = 1689, 


Since c, is close to c, and Pay = 0:987, it is clear that ratio estimation is superior to mean 
estimation, Similarly, rati 


o estimation with two variates is preferable to ratio estimation 
with one variate. 
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Table 1. Numbers of inhabitants (im thousands) in a random sample of 0 large cities 
in the U.S. im 1930, 1940, 1950 


— —— — 
— —— ' 


1930 1940 | 1950 1930 | me | w 
410 672 e" 20 Tos 316 
104 101 116 m E T 
50 54 95 s m “ | 
202 385 593 451 m tot | 
130 | 173 204 ne n 131 
55 | LT] 58 [7] 59 6n 
102 97 130 328 325 332 
54 58 70 781 T 601 
52 62 87 100 101 vi 
71 69 68 5 54 55 
55 50 51 106 112 125 
900 878 915 156 152 163 
47 48 53 578 887 637 
79 82 84 75 78 tT 
50 49 54 63 65 ™ 
115 115 12 105 108 n 
55 5 60 51 46 58 
113 110 109 46 5l LU 
65 70 82 195 194 203 
64 62 63 364 387 427 
65 67 74 102 101 102 
46 49 56 114 111 121 
148 203 334 63 63 64 
115 110 113 308 319 369 
62 71 70 54 59 74 


The following is a summary of the pertinent resulta for the population and the respectivesample estimates. 


1213 1-241 1:266 


1.049 1059 I 
e 1098 1-108 P308 1335 
1-131 1:381 
A 0-029 0-042 0:033 toa 
[ 0-068 0-082 
b (0.039 0-075) (0-061 0-125) 
D (2 -1) (2:38 — 1-38) 
Biom. 45 


Ir 
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7. STRATIFIED SAMPLING 

We first recapitulate the theory for a single auxiliary variate where the population e 

of g strata. Two procedures are usually considered: ) 

(i) A separate ratio estimate of Y^ is made for each stratum and then com 

Specifically N N 

aa %% 

9. VV. Vie, 
D se 

where f= 2X (j = 1,...,9). 
This estimate is called the separate ratio estimate, 


(ii) The conventional stratified sample estimates of Y and X are made, and a rati 
these is formed. Specifically, 


1 
where Ja N . M Je, 


2. - Wb t A go, 


This estimate is called the combined ratio estimate (Hansen, Hurwitz & Gurney, ` 
For each procedure, the determination of an optimum allocation of the n; can be m: d 
We now consider the generalization to the multivariate case. Some of the previous the 
carries through unchanged, but generally, some modifications will be required. We u 
notation similar to that employed for the unstratified case, except that the stratum u 
consideration is denoted by a superfix, e.g. Fo) denotes the mean Y in the jth strati 


X the mean of X; (i = 1,..., p), S? the covariance matrix within the jth stratum, a id 
on (j = 1,...,g). 


7-1. Separate ratio estimate 
Let GO = WPXP HP +... + WP X099, 
PE 7 1 (j= 1,...,9), be a ratio estimate of YO, the mean of stratum j. Wenow fon 
linear combination of the strata means 


as an estimate of F. 

The g weight vectors w) = (wP, ...,w) (j=1 / 
Since the components of , are uncorrelated, V (5) is additive and we may minimize V(j 
for eachj = 1, * f. Thus for this case, the previous results remain valid for each comp 
of jj... The results for the mean and variance of jj, are then obtained by combining the rest 
for the components in an obvious fashion. 

The optimum allocation of My +++ i; is determined by minimizing V (y,), subject to fi 
cost, i.e. Za,n, is constant. The result is easily shown to be i 

MELLE 
Va Ne A 


„„ are chosen to minimize V( 


Ixonam Oxx 


7-2, Combined ratio estimate 
The usual estimates appropriate to stratified sampling are 


" Exo, 


2 -S 6150. 


The combined ratio estimate for the multivariate case is the linear combination 


Rr AE Mit p, Ef, - 1, 


where f = (fy, ..., f) is a weighting vector, Let 
y? = Yo Yep, 
= AA., 
where Ee = Hep = 0, Vie) = Tens, 
Vie) = Si Kia. 


EN, LMT 4 Yd) 
As before H d ae * b Le) 
- Re ENAJN (1-BN PIN +...) 
KUNG -G 
+ (EN) ENAP- APN 


=R{l+af+/A7). 
Here af and H: take the role of a, and f, of $2. Clearly Ha? = 0, so that 
Ej, = Ef, X Ril EH 
= Y+ Tx EB: 
where Ef? = NN. Bee e 


- 21a, (a-pa 
=z Ni ppp. 
Similarly we obtain cov (2 j 50 R. R. Eat aj. 
Bata - Alg (g preted -F dee pce 


NN 1 
=z x: Ian . 
Hence Vg.) = Y*2f,f,Eotoj- 
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If we let A = (a,,):px p, ay = Eajaj, b= (Eb. ., Bbg). 


then V(g,) = Y*fAf' Ej, = II Formally we have reduced the problem to th 0 
framework, so that the previous theory remains valid, e.g. if we wish to choose the 
vector which minimizes I (Nd. the result is given by f = e4-!/eA-!e'. 

The problem of optimum allocation of Ny, e ng Subject to Elyn, = 1, for any i 
vector f is easily obtained, namely, 


N, E fif, Raft afeſ Jl, 
C EXE EPa . 
However, it is somewhat involved to obtain simultaneously an optimum weight. 


and optimum allocation, since this involves maximizing e(AQ/n, + -AOne 
to Elin; = I. i 


8. ASYMPTOTIC NORMALITY 
We first consider the result for an infinite population. Let 


HG, By g P X eeu bx, 
p 


Hy =H(Y,X,,...,X,) = Y, 


H= => == 1, „% 1,...,»). 
j a, | p. x, 8 e) (5j =1,...,p) 


Cramér (1946, p. 366) shows that if H is continuous and has continuous first and ec 
partial derivatives in some neighbourhood of the point (9,%,,...,2,) = (Y, X, d 
then H is asymptotically normal with mean Hy = Y and variance 


p D p 
= mak k Ll wid —2 T PoiCoC; W; t m : "etin , 


Which is the same as (2-7) to 0(n-1). Since we have assumed that rr f 0 and y and v; h 
finite variances, the above theorem can be applied to H = J. j 

We now turn to the finite population case. Let Ut», yo, , UM : 
universes, where the elements of UM) = (UM, UP, ..., UM) are (YY, XY, , X 
suppose that the UW) satisfy a certain regularity condition. Let 


he. bea sequence 


n) m) zn „ T ap) 
uber: fp — 6 (@=1,...,p), 


and suppose limp = Pug —1+e (*j—0,1,...,), then the limiting distribution 
02, , ay) is multivariate normal with zero means, unit variances and covarial 
matrix (9%). 
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This is essentially Theorem 3, Madow (1948, p. 544). The regularity condition is his 

condition W (p. 539), and is satifised, for example, if all the elements of U are uniformly 

bounded. Madow's Theorem 3 yields the asymptotic normality of means, when the sample 

is selected from a finite population without replacement. This theorem plus the theorem 
of Cramér mentioned previously yields the asymptotic normality of j. 
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NON-RANDOMNESS IN A SEQUENCE OF TWO ALTERNATIV 
L WILCOXON'S AND ALLIED TEST STATISTICS 


Bv D. E. BARTON, F. N. DAVID ax» C. L. MALLOWS 
University College London 


The ¢ alternative was developed from the method of paired comparisons and it is] 
method which we use here. It is assumed that there is a random sequence of two alternati 
7, of one kind (x), r, of another (y), with r; +r, = r. This sequence may be assumed to hi 
been arrived at by each x having been compared with each y and also with every other g: 
similarly for y. Thus "C, comparisons will have been made, all independent. If, follo 
these comparisons, an inconsistent sequence results (for example, the first x picked up 
be judged y, < z, < y, while the second 2 may be judged y, S « /i), then the sequent 
i and further C comparisons are made. This procedure may be supposed 
repeated until a consistent set of results is obtained. As was emphasized by Mallows in 
earlier paper, it is not suggested that the person undertaking the ranking does perform 
experiment in this fashion, but rather that the mental process which he goes through le 
to results which are equivalent to those obtained from the model. 
Various assumptions can be made for the ¢ model. We give here three, all of which le 
to the same mathematical set-up. It can be supposed: 


7a women, all of the same age. The judge, asked to arrange this set in order of increasing aj 
will, if he is without bias, 
conscious bias, he may 


s. Under the alternate hypothesis het 
hat an x is ranked lower than a y, but as before, the arrangement € 
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(ii) A variant on model (ii) is to assume an absolute underlying ranking for all r, +r, 
elements, with all the z's ranked less than all the y's. Under the null hypothesis the ranks 
are randomly allotted. With the alternate hypothesis every = i» compared with every 
other x, every y with every other y and every z with every y, it being assumed that in each 
of the independent "C, comparisons there is a probability p (+ }) of getting the two elements 
in correct order. Thus we assume, for example, a series of photographs of men and women 
all of different ages. No competence on the part of the arranger in judging age resulta in 
a random sequence. The more competent the arranger, the greater will p be. 

Suppose that there are r, elements (x) and r, elements (y), r, >ra all ranked. Let & be 
the sum of the ranks of the r, y's and, following Mann & Whitney (1947), let U be the number 
of times an z is ranked above a y. We have that 


U = brara 1) rr - 8. 


Let p be the probability that an xis ranked belowa y. Then, if 4, ëp ..-.4,, are the individual 
ranks of the y's, 


pss y Sp) oc phi Uq" oc GU ac oan — 


where ¢ = q/pT. It follows that U (or S) will be sufficient for p under the set of ¢ alternatives. 
Thus if P,(U) is the probability of obtaining a given value of U under the alternate hypo- 
thesis and P,(U) is the corresponding probability under the null hypothesis 


P (U) 9” 
PAD) = ERU” 
U 
Further, if the probability generating function (p.g.f.) under the null hypothesis is 
G (t) = LAUW, 


then the p.g.f. for the non-null model is 


G, 
8,0 ERUN = GES 


easily perhaps by means of a generating function. If we consider the coefficient of t^: in 
the expansion of 
Th asi 
iei 


this will be a polynomial in hf. The powers of hwill be the differen 3 et ire aes 
and the numerical multiplying factor will be the frequency. For e 4 Aes 
T, = 3 we have 
; iy i (1+h't), 
11 

and the coefficient of f? is 


M6 hr. 2-39 + AWO f Abi. GW 4-409 AIT Abit + BHM" PT IP 
+ Previously Mallows defined $*=4/P-_ y : e aae by. 
1 Explicit expression for this polynomial as 8 algebraic function is given, 

P. A. McMahon in Combinatory Analysis, 2, Art. 
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Given the critical region for S, therefore, the power of S with regard to a given v. 
may be calculated. Table 1 (p. 178) gives the power of S for several values of ¢ for the 


sequences (5, 5), (6,4) and (7,3). 


When r, and r, become too large for the p.g.f. to be manipulated easily, other 


based on moment approximations are 


under the null hypothesis will be 
x. G) = log shade of t^: in II (1 tero, | 
j=1 
2 3 
= Kh ee ERES ss 
Write à log g 


whence the cumulant generating function under the alternate hypothesis is 
K,(h) = 

Expand both sides of the equation and equate like powers of h. If «,(¢) denotes 

cumulant of S under the ¢ alternative, and K, the vth cumulant under the null h 


we have 


The first 12 cumulants under the null hypothesis have been tabled by Haldane & 
(1948). The lower order of cumulants given by them are as follows: 


Ky = $ra(r+1), Kopp = 
r. 
Ka = 12 (t+ 1), 


0 HJ) 


120 l 1) Zz 7,73], 


i nrer-4 1) 
Sel or 60d 


Kav Will be of order (2v +1) in r so that (ce). will be of order —(v— 1) in r. (This indi 


the rapid approach to normality of the 
Under the alternate hypothesis we have 


ó? 
KU) = ef , 


ó3 
Kapi( 9) = Kava i Kenta + ves 


while 


à? 63 
45080) = Kot eet a iE T Reta Gees zt 


[r(r 4- 1) (2r? + 27 — 1)—r(4r - 5) r,r, 4- 2rir3]. 


K($) = EX ES ss 


i | i 
12 | 13 | 14 | 15 | 16 | 17 | 18 
T PGs 2 1| 0 
ae PONA — . 
l 
5 4 |4 32 1 I 
a $ |1 ESASY 


%. ag | ape 
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possible. The cumulant generating function 


K. Co- K.). 


5! 


0 (v>1), 


distribution of S with increasing r and ry/r, 
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Using the moments of & under g two procedures are possible. We may take a suitable 
functional form and using as many cumulante as desired estimate the power of 5 under the 
given 9 alternative. A second, and simple, procedure is to take advantage of the fact that 
the distribution of & becomes quickly approximately normal with inereasing r, We may 
therefore assume, for amall departures of g from unity, that & is also normally distributed 
under ó with mean and variance 


KG) > bg 1) lr rtr 3). 
KAD) rar 1), 


respectively. The power of the test may be calculated in the usual way. Because these 
approximations are adequate for ¢ not very different from unity we do not diseus bere 
different approximation series which may be obtained for the moments as functions of à. It 
is clear that several variants of the procedure which we have described are possible, chiefly 
through the expansion of &. 

Instead of & there are several other criteria which have been proposed for testing for 
randomness in the sequence. We consider here two tests which might be applicable in the ¢ 
situation which we envisage, namely, Mood's median test (Mood, 1950, p. 395) and the 
runs test. Since S is sufficient for ¢ neither criteria can be more powerful than &. but it is 
of interest to compare their power curves under the same set of ¢ alternatives. We take 
Mood's median test first and for the sake of example assume an even number of elementa 
in the sequence. The discussion is easily paralleled for an odd number. The r (even) observa- 
tions are ranked, divided at the median and a 2 x 2 table drawn up. 


11 r rn 


Below the median 
Above the median 


Under the null hypothesis it is immediate that 
"C i C, * EI 
700 = A * 
and em) = mp». 
of both tails of the distribution. 


The critical region is usually taken as the sum 
The power function for b can be built up from a 
of S and b. We take the generating function 


mrar I Q 2-6 II Q +e). 
{=l j=1 


consideration of the bivariate distribution 
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The coefficient of ^w'-^ multiplied by h- will, for a given value of b, generate the f 
distribution of S for this fixed 5. For example for the sequence r, = 4 = r, we t 
generating function 


4 4 
„-N II (1 + At) TI (1+ Moe) 
i=l j= 
and obtain the bivariate table: 


8 10 1 12 | 13 | 14 | 15 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 
o 1618614 | 13 | 12 11 10 9 8| 7| 6| 5| | 


| 4| 3| 2| 1 
— og d — | f — 
| | 
0 S - 12 GNI 5 K 
[oT] : Pg ee 
s T. resse |. | 1 9 1 
3 2 M * 
| 4 1 ` . ^ 


»(5 | H,)oc Z PolS, b) S- HI) oc E Polb) po(S | b) GD Z pols |b) 977. 


The required n probabilities for b under the alternate hypothesis are therefore proportiona 
the coefficient of usi (b = 0,1, .., 7) in the expansion of 


ir r 
bebe aster TT di uc 
$e j-1 


This means that each S array in the bivariate table js multiplied by the weights which 


given to the marginal totals of S and the table is added along the b arrays. For example 
the illustrative table 


PO=2| Ghee de 268+ 5984+ 647 + 8484 og són agi gu 


to q above. The small number of values which b can take precludes any comparison wit 
for sequences (6, 4) and (7, 3). 
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The regression of 6 on & in the null-hypothesis bivariate table is disjointed, but the 
regression of S on b is linear, If the dichotomy of the sequence is made between the Ath and 
the (R + 1)st elementa of the sequence then 


EIS |b) he Ret) 

“it |b) = Af — PR 1) - (rb ir Re Y) G1) (rgb) er R)r- R1). 
For the particular case of a median dichotomy that we have been considering r = 22 and 
&(S |b) = — fbr + prr 2). 

e*(S |b) = dir 2) (4bir, =b) - rir, =r). 


The same argument can be used here as we used for finding the curnulants of & under ¢, 
and we have, remembering d = log , that 


&(S |b, o) — Mor + Arr R1) + ABUS 1) (E — 0) (r4 — b) (r —R+ 1) (r, 7 R5) 
in general and in particular 
EIS | b, g. r- 2R) & — Vor + I 2) + tr 2) (Abr, — b) nr, — 73). 


To this d of approximation the variance remains the same as in the null case. It will 
be noticed that the registi is approximately quadratic for small. As increases the 
higher order cumulants will play a part and the divergence from linearity will become more 
pronounced. 

Under the null hypothesis with the split R+R = r, since 


dr n ED. - 


the correlation between S and b is 
r 3 i 443 
Pe = -ile b 2 
as r increases. The correlation between & and 6 under the . does - 
appear to have any meaning (because of the non-linear regression). rer ' 
however, given the high correlation between b and S in the null case that bwould be nearly 
as efficient as S, which is not however the case. : iba 
Wald & Wolfowitz (1940) suggested 7’, the number of runs in the god ish 8 0 
to be used to test the randomness of a sequence found by two samples. 


problem we have that 
P{T =2} = 2530, 4170, a/'G,; 


E" 

I-21 U- P20 V (b= 12 ote 

it i i ith S orb 

S and b can be used for one- or two-tailed tests, but it v AREENA tests E! En 
which will be comparable with the one-tailed test with T variate distri! 
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and S can be enumerated quickly for small sequences. For example for r, = 4 = T, We 
have 


S| 10 11 | 12 | 13 | 14 15| 16 | 17 1810 20 21 22 23 | 24 
T'NU|16/15|14| isf 2| 11 10 9| 8| 7| 6| 5 4 8 2 
| | | I 
2 AD i ` F - - : E 
3 hom AES es n : . 
ee eee eee tier 11 8). 12/2 | 2 
5& | : C4 ia he ee 
6 Gee s ed goes RES 9121 
7 | |; 2253185 Sm RM 
s | ES Chae 
| 
"n | | 
Total dua ee „„ „ a e Pelt h bal 05:8 | 2 
| | 


To find the probability distribution of T under the ¢ alternative we weight the S arrays as 
before and add up the 7 arrays. Thus, for example, 
PUT = 5) oc 268 + 4994267 + 298-4 290 + Ai agu 

and so on. It will be seen on referring to Table 1 that T is inefficient compared with & for 
the three illustrative sequences. Under the ¢ alternative the power of 7' does not vary very 
much with the composition of the sequence. It is clear from considerations of symmetry 
that &(S | 7) is constant whatever T. The regression of T' on S does not appear to have an 
easily calculable form. 

The three criteria S, b and 7 whose powers we have discussed against a ꝙ alternative were 
originally proposed to test for possible differences in location of two populations by means 
of tests for randomness in the sequence of two alternatives. It seemed interesting to us to 
try to vary the ø model under the alternate hypothesis and, by so doing, produce criteria 
which might be used also as tests for dispersion differences in two populations. Let us suppose 
that there are 7, 2’s and r, y’s and that the latter fall into two classes, I and II, of / and m, 
respectively (+m = ry.) We further suppose the true situation is that the / y’s should all 
be ranked below the rı x's which themselves should be ranked below the m /s. Each & is 
compared with each y and we assume that in all comparisons there is a probability p+% 
that the ranker correctly assigns relative ranks to the pair, the / y’s always being ranked 
with certainty below the m y’s. As before we suppose that the result of the com parisons 18 
à consistent sequence. Let C be the number of paired comparisons the elements of which 


are in the correct relative order. Then the probability of any sequence with C correct is 
proportional to ^ : 
p grits Co $-9, 


where ¢ = q/p as before, Now C will be equal to the number of z's ranked above the 7s 
of Class I plus the number of z’s ranked below the y's of Class IT. The number of 2's ranked 
above the y’s of Class I will be equal to ii minus the number of as ranked below the y’8 
Class I. We may write this as 


71720 111 - u ＋ 


or E! 
C= Tym — Ug + Uy. 
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Now w = 1) erl-8,, 
where S, is the sum of the ranks of the / y's, and 
Uy = Im(m 1) rm — (8,— Im), 
where S, is the sum of the ranks of the m y’s, so that 
C = 8,— S, + constant. 


It follows that the difference between the sum of the ranks of the upper m y's and the sum 
of the ranks of the lower / y's is sufficient for ¢. 

It will be recognized that the ¢ model of the previous section will lead to a situation in 
which the y variable seems to have a greater dispersion than the z variable. The criterion 
using the rank differences will be, therefore, a linear test for differences in dispersion between 
two variables. For / known the test reduces to finding the difference between the sum of the 
upper (r, — 1) and of the lower I ranks of the y elements. If we write 


S* = 8,-S, 
then the generating function for S* is the coefficient of f in 
Hire r 
NONE os ee 
Thus for a sequence of r, = 4 = r, and J = 2 the distribution of S is 


S* 4 5 6 


| f(S*) 5 8 | 13 


A small value of S* denotes possibly that the dispersion of the y's is too small, a large value 


of 9* that the dispersion is possibly too big. The power function is obtained by weighting 
{(S*) proportionally to -. Thus, in the example immediately above the distribution of 


S* under the ¢ alternative is, 


the actual power function depending on the size of the ect vi ae of this 
power function for a sequence of composition (5, 5) is given in Table : p. TN 

The p.g.f. of S* with increasing sequence length quickly becomes difficult to eee 
We therefore follow the usual procedure and investigate approximations to the paiz u- 
tion of S* through its moments. These may be obtained by an apponi 3 first principles. 
Let 21, 22, ...,2, be the ranks of the y’s in ascending order, and accordingly 


1 Uu 
S*-2-YX«utX-^* 
P i=l+1 
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The problem is therefore one of finding the moments of the ith largest in a 
drawn from a finite population of the first r natural numbers. We have that 


Plz. = RI = COO, (iS B, & r— r,- i), 


P((z; Ri) (= R. = RAC, BOBRI gi SS f "O 
i<j, R< Rej-i«Br-rij «Rr t) 


and so on. It is easy to see that 


i(r+1) i(r+1) ee 
éz) = "rl? elà) = (ri 1) (ra x3) Ci i 2), 


Il) . 
4669 7 cn gra n tine. 


from which we have 


ae r(r 1)i(r,—i4- 1) 


"7 (nile 2 la Ee) -8 = ain- 


(r2+1)?(r4+2) ° 


z 154710 
14 Un-irl): 


The mean and variance of S,, S, and && are accordingly 


+1 (1+1)® 1 
1 E, a lere 1)64-1)- -L 1), 


E08") 2S rs D 6-D- KE 1) — 40, - Lens 


1 
md cete arie g 0940-2) UC -c leg, 


1 
ete Fer Killa De G2) (rg—1+ 1) — 1r, —1+ 1)9], 


1 
een (ri cg B Le) (19, 


d 
OR = etry rit YO + (rg —1+ 1)®) + Mr, — 2) (A 1) + (rp — L4- 18 


SELE 1)9 + 200+ 1)9 (r, — 14 1) + = 


The vth cumulant of Sr, en, is of order v+ in fa, 80 that the standardized yth cumu 
is of order 1— 1v in Ta. For ry a fixed proportion of r, with increasing r, the standa 
cumulants of order greater than 2 will tend to zero and we may therefore expect th 
tribution of S* to be approximately normal. To illustrate this we set out the true distribu 
of 8. for a sequence of composition (5,5) with 1 3, and the corresponding frequ 
obtained from anormal curve which has the mean and variance of S*. 
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Comparison of exact and normal frequencies of distribution 
of S* for a sequence of (5,5) with I = 3 


a a 


| i i 
As") 1 js | lie |o |s |2 |n | 208 
Normal 12 2-0 | 47 | 9:3 | 161 | 262 | 31-9 | 366 | 366 | 31:9 | 862 | 161 | 93 | 67 20/12 | 252 | 


c 


The true and approximate distributions are close and the normal distribution can obviously 
be used for estimating the distribution of S* in short sequences when the true distribution 
is not too asymmetrical, The approximation should not, however, be taken to absurd 
lengths. For example when r, = 2 and / = 1 no matter what r, the distribution of &“ will 
be a right-angled triangle. It will be in fact, as follows: 


Distribution of S* for any sequence when r, = 2 


f(S*) 


and no normal approximation is possible. For r,> 3 even with 1 = 1 the distribution of S* 
is reasonably approximated to by the normal with increasing r. 

It is obvious that it will be rare in practical problems for / to be known. If lis not known 
then two possibilities suggest themselves as tests for dispersion under the null hypothesis: 
we may dichotomize the sequence as near the median of the y observations as we can, or 
we may dichotomize the sequence near the median of all r observations. 


(1) Dichotomy close to the median y of the r, observations 
If r, is even we can divide at the median and take r 0 = l If r, is odd we may divide 
by taking 21 = r,— 1, 2(ry— l) = rą +1. Tables of the distribution of S*, under such dicho- 
tomies, have been calculated and are given in Table 3 (p. 179). For reasonablesized sequences 
S* may be assumed to be normally distributed with mean and variance: 


3(p 1-1 rl) 3 à 
(i) even é(S*) = ae. o% = ant XI 100 


1 rira(r+1) 9 
Ene, op ED ra 


(ii) rodd &(S*) = 
(2) Dichotomy close to the median of all the r observations 
When r is even the sequence can be divided by the median into two equal parts. we 
r is odd we have taken the line of dichotomy between the Rth and the (R+ 1)st ra 
of the sum of the upper set of y observations 


i = 4(r— 1). The difference ea 
ee hu 1 z T 15 observations will be S*. Tables of the distribution of S* 
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under these conditions have already been given by us (Barton & David, 1958). For reason- 
able r, S* may be assumed to be normally distributed with mean and variance: 


3 
Greven eg EEE e 50D ( 2, 


= (r— 

(i) rodd ese =-. ag 4 — 264. 

Either of the two statisties which we propose will be useful as a quick test for difference 
of dispersion for two elements in a sequence, even if the basic conditions of the model are 
not satisfied. We illustrate Method I on data quoted by Freeman (1953) concerning the 
distribution of the disease of nettlehead in hop plants which have been planted in a rect- 
angular lattice design. According to Freeman some of the plants had died, but we assume 
that they are alive and healthy in order not to prejudice the issue. This assumption makes 
the worst conditions for the application of the test. The layout is given in Table 4 (p. 180), 
where N denotes a diseased plant and the vacancies are all free of disease. If the plants are 
attacked randomly by the disease then the healthy and diseased plants in any one array 
should together form a random sequence of two alternatives. If, on the other hand, the 
disease tends to spread from plant to plant then the dispersion in the sequence will be less 
than might be expected on the hypothesis of randomness. The analysis of each vertical 
array using Method I shows that the dispersion is significantly less than it should be in 
6 of the 11 arrays. The mean of the ratios 


S — 1 
Og 


equivalent to a unit normal deviate of —3 and so is significant. 

Many variants of the models which we have set up are possible, though most of them do 
not lead to sufficient statistics. The variant of the previous probability model which is 
possibly the most interesting is when we omit the restriction that the judge is able to place 
the Z y's of Class I with certainty below the m y’s of Class IT and we suppose that he has à 


where 5; and a, are the number of ys and of s in the ith group of each, respectively, then 
the probability of such a Sequence is proportional to 


. . b 
$354: times the coefficient of 2! in II Il (ITB), 
i jul 


where A. = Xa, and B. = N b,. 
j«i j«i 

It seems clear that there is no sufficient statistic for $. If we expand the above expression 

in powers of ô$ = ġ—1 we find that it is proportional to 


viene ie MP [rfe tao] ah 
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Now U* is equal to 47,(r, + 1) plus twice the number of paired comparisons between z and y- 
It is, therefore, essentially Wilcoxon’s statistic. If the rank of the ith y is R, and 


1 2R,- i, 
l ^ 


Su -i. 


m-th 


It will be noticed that when } = r, — 1 the test statistic is V. 

This ó model which we have just set up would seem to be a little difficult to translate into 
general terms. The intrusion of Wileoxon's statistic suggests that the model might be used 
to represent a change in the parameters of location as well as those of dispersion and there 
is no question but that a change in the former would mask almost entirely a change in the 
latter. The V of the tesffunction is very highly correlated with the sample variance of the 
r, y's, a statistic which has already been suggested (David, 1956) as test for dispersion. But 
the situation is not entirely clear and has no obvious in! ion. 

In this present paper we have put forward a model for the alternate hypothesis to ran- 
domness in a sequence of two alternatives, which would seem appropriate, for example, in 
a situation where the order in the sequence results from a judge's ranking. We have shown 
how two different variants of this model lead to statistics which are optimum to detect 
specified departures from randomness and have been able to suggest general interpretations 
of such departures. In the first case we have shown that Wileoxon’s statistic is sufficient and 
we have compared it with two other criteria (Mood's median and the number of runs) which 
have been suggested for testing the same null hypothesis. In further discussion of ranking 
models alternate to randomness we shall give models for which: (i) the number of runs is 
sufficient and compare it with Wileoxon's and Mood's statistics; (ii) Mood's statistic is 
sufficient and compare it with Wileoxon's statistic and the number of runs. 


then F= 
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Table 1. Power distributions for S, b and T under the ¢ alternative 


71 ra | $ 
5 5 8 
6 4 
7 3 
5 8 5 
5 51 
6 4 
7 3 


(Irregularities in the distributions are due to the device which was used to make them comp 
i.e. the assumption of continuity.) 


Table 2. Power distribution for the rank difference test for dispersion under the ¢ alte 


none o Ims L0 0-9 0-8 0-7 0-6 0-5 


5% 0-05 | 0-086 | 0-144 | 0-235 | 0-366 | 0-533 | 0-76 
217523196 | 0-05 | 0059 | 0-089 | 0-148 | 0-249 | 0-394 | ost 


Table 3. Distribution of S* = S. S. 
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Table 4. Nettlehead plants in a lattice design 


180 
pý 
Rank 
— — LÀ 
1 : : 
2 5 N 
3 N N 
4 N N 
5 N N 
6 N 
7 N N 
8 N P 
9 N N 
10 à N 
n N N 
12 y P 
13 N N 
14 N is 
15 $ j 
16 ‘ : 
17 8 4 
18 i H 
19 N : 
20 : : 
2 : 5 
"n de ; á 
23 : $ 
24 s 3 
25 : 
26 8 
27 : 
2 N 
30 r 
l A 
E 30 30 
n 19 20 
Ta 11 10 
i 5 5 
121 6 5 
8, 27 20 
Sa 86 50 
NE 59 30 
Mean S* 93 7045 
ot 11-9781 | 11-5052 
(Scl | -2-7968 8.4727 
Mean T 1493 | 1433 
Tp 8 2.493 2.381 
13 E 
- % | -0575 2870 


Row number 
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SIMPLIFIED RUNS TESTS AND LIKELIHOOD RATIO 
TESTS FOR MARKOFF CHAINSt 


By LEO A. GOODMAN 
University of Chicago 


1. This paper will first discuss the ‘group’ or run test for randomness in a single 
sequence of alternatives. This test was presented by David (1947) for the case where there 
are two kinds of alternatives. The case where there is an arbitrary fixed number, «> 2, of 
kinds of alternatives will also be considered, where the single sequence of alternatives 
consists of a long chain of observations. A simple derivation of some long sequence group 
(or run) tests will be presented by making use of a result due to Bartlett (1951) concerning 
the asymptotic distribution of the observed transition numbers in a probability chain. 
This derivation indicates some close relationships between standard asymptotic resulta for 
multinomial trials and certain results in the large sample distribution theory of runs (see, 
for example, Mood, 1940), so that a diseussion of it may further the understanding of this 
distribution theory. The simplified group tests, developed herein for testing hypotheses 
concerning a probability chain consisting of s states, are also intended to illuminate some 
results presented by David (1947), Moore (1953) and Barton & David (1957) on group testa 
of randomness. 

The general approach presented herein indicates that these group tests will be appro- 
priate for testing the null hypothesis of randomness, or certain specific generalizations of 
thisnull hypothesis, against certain specific kindsof alternate hypothesesconcerning Markoff 
chains. This general approach also leads to some simplified tests, which are similar to stan- 
dard tests of independence in contingency tables, for hypotheses that have been considered 

by Hoel (1954) and Good (1955). These simplified ‘contingency table’ tests for Markoff 
chains, which are related to, although different from, the likelihood ratio tests given by 
Good and Hoel, are tests of certain specific generalizations of the null hypothesis of random- 
ness, and are in general different from the group tests. Good, in the errata to his article, has 
referred the reader to the results presented in this paper and has agreed to the correction of 


a number of inaccuracies which will be pointed out below. ET 
Ina previous paper by Anderson & Goodman (1957), there was a discussion of hypotheses 
chains which are quite different from the main ones described 
udied in that paper did not lead directly to the development. 
was concerned mainly with the case where there 
m a Markoff chain of fixed (perhaps even short) 


herein (e.g. the hypotheses st 
and useof group tests). This earlier paper 
are a large number of observed sequences fro 
length, while there was one brief section ( ; 
sequence consisting of a long chain of observations. 
2, David (1947) has suggested that as a test of randomness in a sequence of fii 
where either the event E or E will occur in a single trial (i.e. the case of two TL 
the total number k of observed groups of E and E that appear in the sequence sho 
? istical Research Center, University of Chicago, under sponsorship 
of B me 0o The author is indebted to T. W. Anderson and Ingram 
Olkin for some very helpful comments. : 
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compared with the conditional distribution of k, under the null hypothesis of randomness, 
when the number n, of observed 's and the number n; of observed 's in the sequence are 
given. It can be seen that the observed k is approximately twice the number ui of times 
that an Æ is followed by an E in the sequence. More precisely, the observed transition 
number nie is such that n- $k | <1. Thus, when the sequence consists of a long chain of 
observations, the test suggested by David may be closely approximated by comparing 
the observed transition number u with its conditional distribution under the null hypo- 
thesis. Since the transition numbers, and the initial state, form a set of sufficient statistics 
for the parameters in the transition probability matrix of a Markoff chain with a constant 
matrix (see Anderson & Goodman, 1957), we shall discuss tests, which approximate the 
group tests, but are based directly on the transition numbers. The distribution of the 
transition probabilities has been studied by Bartlett (1951) and Whittle (1955), and will 
now be discussed here. 

3. Consider a Markoff chain with transition probability matrix P = (p; j); Le. pj; is the 
probability that the variate takes the value j at time t, conditional on the value having been 
i at the previous time / — 1, where Piz is a constant independent of t. We shall assume that 
the number of states is finite, and shall number the states 1, 2,...,8. David (1947) studied 
the case where s = 2, and the chain was assumed to be stationary. Suppose now that we have 
n consecutive observations from the chain, and that the number of observed direct transi- 
tions from i to j is n; (i, j = 1, 2,...,8). Bartlett (1951) has shown that, if P has no eigen- 
values on the unit circle except the simple value A = 1, so that the chain is ergodic and 
irreducible, then the n,; are asymptotically normally distributed with expected values 
Tt; ~ NP, Pij where the asymptotic occupation probabilities of the chain are denoted by B. 
Whittle (1955) has shown that the variates y,, = (% mg/ Vn have asymptotically the 
frequency function 

P(y) = const. exp E $$ ve s] f (1) 
tim PH py 
where 4;; has the value 1 or 0 according as i and j are equal or unequal, 29 = Ens 
I 
(= 1, 2, ...,8), EXw — 0, and y — (Yy). 


a Ty = ( n. p.) where n; = D Nij. Then Tiy (% pug DY) Po Eus -0 
an 7 


ee = EEE (i-. 00 
1 7 tim ti Pry 
Since the Jacobian of a linear transformation is constant, it can then be seen that the 


variates , which are a linear combination of asymptotically normal variates, have 
asymptotically the frequency function 


P'(x) = const. exp[— i E X (ipy) (3) 


where Ja = 0, and z = (v). Let Zij = (nj . 5%) / Vn. Since n, / % converges in 


probability to l, the variates 2% 29 = ai{1—/(nP;/n, )) converge in probability to 0, 
and the variates z; have asymptotically the frequency function P'(z), which is the same 88 
the function obtained for the Tu. 


The asymptotic frequency function for the 20 is the same as that obtained when a fixed 
sample of size 2, is drawn from a multinomial population of classes j21,2,..,8$ with 
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associated probabilities p,,, where ng is the number of elements falling in the jth class from 
the ith sample, and s independent samples (i = 1,2, ...,) are drawn (see, for example, 
Wilks, 1944, p. 217). The maximum likelihood estimates of py are Py = ng, and 
Za = (Pis — Pij) Vn.. The large sample variances and covariances of the 2% or of the 
wy = zy (nPin; ) = J(nPj) (Dj — Py), are obtained simply from the standard asymptotic 
results for multinominal variates; i.e. 0% % = pp Tiyu = —PyPa for jt, Tym = 0 
for i+ k (see Bartlett, 1951, p. 93). Thus, the large sample variances and covariances of the 
Îi; are obtained as standard multinomial formulae where, however, the n, are replaced by 
their asymptotic expected values nP;. We have the additional result that the e, when 
properly normed, are asymptotically normally distributed, and the frequency function is 
determined by P'(z), where the constant in the formula can be evaluated directly from the 
standard asymptotic results for multinomial variates. 

Since | n; — n, | <1, where n, is the number of observations in state i, it is clear that the 
asymptotic statements presented in this section also hold true when n, is replaced by ny. 
For the sake of simplicity, we shall deal with n,, rather than n, , in much of what follows 
herein. 

4. Let us now reconsider the case when s = 2. The null hypothesis of randomness (i.e. 
independence of successive observations) states that pie = pss, and a long-sequence test 
of this hypothesis can be obtained by computing v — (Dia Pas) (Dads [ni + n3 ?]), where 
Po = (n3 Pro + i fog) /n ( +Noa)/n = N n nn and % — I which is a standard 
procedure for comparing observed proportions from two large independent samples. From 
the asymptotie distribution results presented in the preceding section, we have that 
v~ (Pi2— Dag) Vn = Wa P, — ws] Pa + (Dia- Pra) Vn, and the asymptotie mean of vom 
(pia Pz) In, and the variance is pis Pr/P,+ Poe Pi Pr If the null hypothesis that 
Pra = Poo = Po is true, then p; = P, and the mean of v is 0, the variance is 


py(1 —po)/Py + Pall —P2)/ Ps = Pat (I- = 1, 
and the asymptotic distribution of v is the unit normal. For any alternate hypothesis 
Pie E Po, the asymptotic mean of v approaches infinity as „n>, while the variance 
remains a constant that depends on the values of pj. and ss. 
We also have that 
v ~ (ma — ANI) V/ Ine M (ng n/a] V) /. (4) 


Since | n NL <1, 
v ^ [Myo(1 /e) 7 4] 57% = [monn — 74] „2 = [ns — nnan] nl (n ng). 


Thus, a test of whether v differs significantly from 0, which is a natural long-sequence test 
of the null hypothesis of randomness (i.e. Pia = Pa); does in fact it nut the iplis 
my» differs significantly from n ne/%, E ma A significantly from 2n,n,/n, whic 
is directly related to the test suggested by Davi ee : 

Both dms and Moore (1953) assume that the chain us BMG. eti snn 
assumed that the observed chain is long and that it is ergodic and eas) "i on 
is approached, it is not necessary to assume here that Os haa ea hai ^5 js 
in a communication to the present author, mentioned that, for DT e ra E 8 
and £'s (i.e. s = 2), the number of runs of E's, the number of runs o ie 4 e qn 1 
and the number of Z's, form a sufficient set of statistics. He also prov A. 
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statistics. When s = 2, there are four transition numbers; but since |n,,— Nr | <1, th 
uu, Mia Maz, Or the set ms n,, My, for long chains, will be approximately a set of suff 
statistics. Also, since n, +n, is fixed = n, the statistics ui and n, will approxima 
sufficient set. Since, |n,,— 4c | <1, the number E of runs and the number n, of BY 
approximate a sufficient set, and testa can be based either on the statistics i and n 
k and u. The one-sided test of whether v differs significantly from 0 is based on DN. 
and it is an approximation, for long chains, to the uniformly most powerful unbià 
test. 

The asymptotic distribution presented here for the variate v~ [n;a nd 
when the null hypothesis of randomness is assumed, is similar to, although not ident 
with, an asymptotic distribution in the theory runs (see Mood, 1940, p. 381). The 
presented by Mood was obtained for the case where randomness is assumed and the x 
n,/n remain fixed as noo, while in our case the n; are random variables. He also deals 
the case where the n, arerandom variables, but in that case he studies the distribution, w 
thenull hypothesis, of (ma — np, P2)//{n(p, P3 — 3pip3)}, where Piz = Poo = p,andp, = 1 
(Mood, 1940, p. 392). It is possible to derive the asymptotic distribution under the 


nomial distribution theory. This point will be discussed further in $6. 

5. David (1947) discusses the power function of the 
served E's and n, observed Es. The null hypo 
Pu = Pan = P. The alternate hypothesis is 
P, = Pa Dia T pa). Since the alternate hypot 
Vn Pie Pr P Pei or that p, Pei; i. e. pj S Po. Th 


v~ (Pus — Pea) Vn = (Px PI Vn nnn) nij (nina) differs significantly from ! 0 i 


group test where there are n; 
thesis considered is that Pra = Paz; thi 
that p, 2 P, where P, = Pp D [ 


general case where P, 2 0:5, and th 
ate the preceding statement. Sine 
ternate hypothesis Pu =P, (P SPa 
then it can be shown, simply by considerin 


observed Sequence is large, which may further illumin: 
P, Pis = Py Por, then Pai/P12 I/ LS 1. When the al 
is true, and it is also assumed that pa & Pys, 
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the curve p(l — p) as a function of p, that r = Pull Ps Pul? l. TT WP sos 
ot if the null hypothesis is true. N - DA run 

Let us now consider the case where a sample of size s, is drawn from a binomial population 
with parameter Pyg and another independent sample of size =, is drawn from a baren 
population with parameter py. The allocation of sample sizes m, and my where s = nom 
is fixed, which minimizes the variance of v = (P,, ~ fig) / is determined by the solution x, 
of the equation 2*(r—1)+22—1=— 0, whem * ^ PoPullPuPy) aed sja = 2, When 
r= l, then x9 = j| and z, decreases as the value of r increases. Thus, when r > 1, then x, <4. 
Also, when r>1 in the case of two independent samples of sizes m, and m, it is clear that the 
variance of v is smaller when n/n = than when m/m > Ji but that the variance is smaller 
when n,/n = zy < ] than when uin = J. Thus, the variance of e is minimized when * 
in the case when r = 1 (i.e. when N = 0-5 or when the null hypothesis is true), and only in 
that case. Also, when the null hypothesis is that p, = pa and the alternate hypothesis i 
that pj, € pgs, and it is also assumed that p, € py (or P, » 0:5), then the variance of v is 
smaller when m, = n, than when n,» n; however, for each alternate hypothesis where 
Py > P, > 0-5, it is possible to determine values of n, and n, so that m, <n, and the variance 
of v is smaller for these values than for n, = ny. 

It was shown earlier that there is a close similarity between the a«ymptotie distribution 
of certain statistics (for example, v) that can be computed from an observed sequence from 
a Markoff chain and related statistics computed from data obtained from independent 
samples from multinomial populations. The preceding paragraph, which discussed the case 
of two independent samples from binomial populations, was intended as a suggestive intro- 
duction to certain exact results presented by David (1947). This discussion suggesta that 
the test of randomness under discussion, when the observed sequence is lange, will be most 
powerful when n, = nq, if the alternate hypothesis is limited to P, = 0-5. Also, m, = ny 
will be preferable to n, » n, when P, is assumed to be not less than 0-5, and n, = n, will be 
preferable to n, <n, when P, « 0-5. This discussion is only suggestive and can not serve as 
à proof, since (a) the exact conditional power surfaces, given n, and n, are of interest (but 
we have not considered herein the conditional distribution of v, given n, and n,, but rather 
the asymptotic unconditional distribution, which however is closely related to the asymp- 
totic conditional distribution), (5) the large sequence results presented MON wt vtl 
statistics from a Markoff chain depend on the fact that n,/n converges in probability to I. 
and (c) we did not discuss directly the power of tests but rather the variance of estimates. 

6. Moore (1953) presents a procedure to test the null hypothesis, H,, ‘that there is 
randomness within the sequence', and he implicitly suggests that this is the same null 
hypothesis considered by David (1947). In applying the procedure suggested by Moore 
(1953), we see that the null hypothesis that he actually considers is that py = Py = p, 
where the common value p, must be specified under the null hypothesis aa yen ha vele 

thesis. This hypothesis thus differs from the hypo- 
of vn and py, under the alternate hypo : : z by David 
thesis of randomness (i.e. independence of successive observations) considered | y Davi 
7 E = ification of p, under the null hypothesis nor the 
(1947), which did not require the specification of p, unc de Sui d 
Specification of Pu and pa under the alternate hypothesis. Moore ( ) states al e 
likelihood ratio in this case is 


D,a 
Pr (t sun) e Paul (5) 
b= Een HU 2pr pi Gas 
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where 2t is the number of groups observed. Moore focused attention ‘on the case of an even 
number of groups, bearing in mind that the results, at any rate for a large number of groups, 
apply equally well to the case of an odd number of groups’. 

It can be seen that the probability of obtaining a given observed sequence from a Markoff 


chain, when the initial state is fixed (i.e. not random), is simply TT TI 7". Thus, the likeli- 
i j 


hood ratio, when the initial state is fixed, is L* = pip pip pip pap [( pr "» pret"), whichis 
of the same form as the likelihood ratio obtained when the observations consist of two 
independent samples of sizes n4, = n41 +a and ng, = no, +n from binomial populations 
with parameters m and py, respectively; the null hypothesisisthatp,, = po, = p, (specified), 
and the alternate hypothesis is that p; +, and these values are also specified. The like- 
lihood ratio L**, which is obtained when the initial state is not fixed, is L*P,/p, if the 
observed initial state is E, or L*P,/p, if it is E, in the case where it is assumed, with Moore 
(1953), that the Markoff chain is stationary; ie. P, = P, p, +P, po, is the probability that 
the initial state is H. Again, the close similarity between the methods of inference about 
Markoff chains and inference based on independent samples from multinomial, or binomial, 
populations is apparent. 

The ratio L presented by Moore (1953) differs somewhat from L* and L*. We shall now 
examine this difference. The joint probability distribution of the observed number n, of 
Eis, the number n, of E's, and the number k of groups, in a sequence of a given number 
n = n+ n of observations, is 


peel ( ( Bn 5 di tts 
1 DIVEST ge Pup 
for b = 2 2 2; 
Pr {k, n, n | H) = ee afa) 1 27 
Pu LuV t t—1 AES t-1 ( t Pipa 
for k= 2t+123; 
pips, for k=0 and min [n n] = 0; 


where t< min In, ng] and Jd Y = 0; this fact can be seen to follow from an argument 


similar to that appearing in David (1947), where there is a derivation of the conditional 

probability, Pr (I: n4, ng, Hi], of k given n, and ną. Moore (1953) uses the symbol for the 

conditional probability, and implies that he is concerned with the conditional probability 

of k given n, and na, but the formulae that he presents seem to be actually related to the 

unconditional joint probability Pr (I, n4, n, Hi, rather than to the conditional probability. 
In the special case where Di; = p; = Pj, the formulae for Pr {k, n}, n, | H,} reduce to 


11 1 —1 
6 (EA) oor for = 21 f 2, 


Pr {k, n, n Hj = ie") (re (n) (e? (7) 


t t-1 t-1 t 
prp? for k=0 and min [n,, ny] = 0. 


)| ror for k= 2t+123, 


We have that 


e) 6:967) - 65) ES 
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Writing 2 min [n,,n.]+1 = K, the joint distribution of n, and n, is 


K 
r hon n. Ho} S Pr (k, mms a) = (7) pt pr for min Ius, n,] 0. (8) 


Thus, we have that Pr {k | n4, na, Ho} U for k22, (9) 

and Pr(0|n,n, H,) = 1 for min [n,, n4] = 0, (10) 
-1 -1 

where fu = 20 1 ) FR 1 ), Suss = Jaln — 21)/(20). 


This is a simplification of the formula for Pr {k | n,, u, Hg} given by David (1947, p. 334), 
since the term Y, f, which appears in David's formula, is not evaluated explicitly there, 
allt 


while we find that it is equal to 0 (see also, Wilks, 1944, p. 203). 


The ratio L computed by Moore (1953) is, in fact, L = Pr (E, un, n, | H,}/Pr{k, ny, n4 | Hp}, 
when & = 2t, rather than the ratio of the conditional probabilities Pr {k | n4, 4, Hi] and 
Pr {k | n, na, Ho] as was suggested in his paper. By some direct calculation, it can be seen 
that L = L** in the case where k = 2t; i.e. the case considered by Moore (1953). However, 
when k = 2t 4- 1, the value of L will, in general, differ from L“, except when the alternate 
hypothesis includes the assumption that P, = 0-5. The difference between L and L** is 
due to the fact that L** is based on the probability of obtaining a given sequence of observa- 
tions, while L is based on the probability of obtaining a given set of observed values for 
k, ui, and ng. 

Moore (1953) also considers the particular case where, under the null hypothesis, 
Pu = Pa = pı = 05 (ie. perfect randomness), and under the alternate hypothesis 
P, = 0-5 and p,, is equal to some particular value other than 0-5. Thus, under both the null 
and alternate hypotheses pi = Pa and p; =P. The null hypothesis states that 
Dij = Pa = 0-5 (thus, p, = Pa = 0-5), and the alternate hypothesis that states p, = pos 
is equal to some particular value other than 0-5. M" 

When p, = 55, the likelihood function of the observed sequence, when the initial state 
is fixed, is simply prye+™p%+™, which is of the same form as the likelihood function ob- 
tained when a sample of size n, +n, = n — 1 is drawn from a binomial population. Since 
P, and p, both equal 0:5, we have that 


L** . eff napi = gnop 1 ann ir v (1) 
Since k = n, ++ 1, we have that 
L = -A- Epil, (12) 


which can be applied for both even and odd values of k: We see that L is of the same form as 
obtained in testing that the probability of a ‘success’, a a Dam pie of n-1 rp n rer 
is 0-5 under H,, and py (specified) under H,, where k-1 successes’ are . us, t; : 
procedure suggested by Moore can be generalized so that it is applicable or : even = 
odd k values, and this procedure can be jr d ss Ar it is of the same form as the 
rela; or tests based on bino: : i 

m e im denim hypotheses considered by Moore (1953) were simple hypo- 
theses, while the hypotheses considered by David (1947) were composite. Let us now modify 
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the particular case, which was considered by Moore (1953), so that the alternate hyp 
will be composite. Consider the null hypothesis H, that p, = pa, = p, = 0-5 
alternate hypothesis H, that P, = 0-5 and p is unspecified. Thus, p,, = Pu = 0-5 
and pu, = Py is unspecified under Hi. Using the results presented in the preceding see 
we see that a large sample test of this hypothesis can be obtained by computing 


u = [(n Pir 0529) /n — 0-5]/ 4n, 


which is a standard procedure for comparing an observed proportion with 0-5, 
Pu = Pas, the estimates Pi and f, are pooled, and the pooled estimate (n, PI + n, fg 
compared with 0-5. From the asymptotic distribution results presented in $ 3, we 
that u ((n;)zy + (n3) Zes — In +N, Py, +g Poo) 2/ /n and the asymptotic mean of 
Vn (2p,, — 1) and the variance is 4(P, py iH ps; pa). Under the null hypothesis, the 
is 0, the variance is 1, and the asymptotic distribution of u is the unit normal. 
Since n, Py, + Mg Pog NI + Nye n — (nj Hna) ~n—k, where k is the observed numb 
groups, we see that u~((m—k)/n—4)2,/n = (3n — k) 2/,/n. The statistics n, and n, do 
enter into this expression for uw; this is due to the fact that k (or, more precisely, k an 
initial state) is a sufficient statistic when it is assumed that p, = pa. This fact can be 
to hold true by the following approach. Whenitis not assumed that p,. = p», then thes 
sufficient statistics, when n is fixed, is n4, nia nep Neg, and the initial state. As was 
tioned earlier herein, this set of sufficient statistics, can be approximated, for long chi 
by the statistics k and mj. If P,» = pa, then the set of sufficient statistics, when n is fi 
is n, Nyy ＋ Nef, and the initial state. Since (Nir + Noo) + (45 T N = n— 1, this 
sufficient statistics can be approximated, for long chains, by the statistic n,a -- n,,, wh i 
approximately equal to k. 
We have seen that the hypotheses and tests discussed in this section are quite diffe 
from those considered by David (1947). However, when the sequence of observatio 
long, all of these tests can be approximated by group tests; i.e. tests based on the statis 
k, ny, and ng. In the special case of s = 2, the set of statistics k, 714, Nz Will approxima 
sufficient set, when there is a long-observed sequence, and thus all reasonable testi 
hypotheses considered herein will be group tests. This will no longer be true when 
Simplified groups tests for the more general case where s = 2 will now be presented. 


7. Let us assume that, for a fixed j, Pi; = p; (unspecified) for all i +j. The null hypoth 
to be tested is that Dj; = D. A large sample test of this hypothesis can be obtained 


computing 
= Umufn-n)- B [ai a. 


n. Ns, 


where f; = Duaßuln c nn and 4; = 1—;, which is a standard procedure for compa 


observed proportions from two large independent samples. From the asymptotic | 
tribution results presented in § 3, we have that 


97 [x ln nj) n [Y (nj) 20 — zyn — nj] nj + à Dis — pyn- u) 
x V/ nj), 1 


the asymptotie mean of v; is (p; — pj) Vn and the variance is 


S ue, = py; — P) + pyqyJB; 


| 
| 
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where 4, = 1 — p, and % = l—py M the sali hypothesis is trus, then A; = Yo A; = p, 

and the mean is 0, the variance is p, +g; 1, and the aeymptotie distribution of v, is the 

unit normal. For any alternate hypothesis p, +p, the asymptotic mean of r, approsctes 

ET as , n — , while the variance remains a constant that depends on the values of p,, 

We also have that ey ~ [y ei- min] niin — m) 

~ [k (n, — k) (m nn] Jan — m) 

= [kynn — n)a) nM [nin — mj]. (s) 

where A is the numberof groups of observations in state j in the sequence, and b~ zw 


This result concerning the asymptotic distribution of [k,— mn — n) n] nl /[m(m —m,)}, 
when the null hypothesis is true, is similar to, although not identical with, an asymptotic 
distribution in the theory of runs (see Mood, 1940, p. 383), and the method of proof is 
somewhat different. The distribution theory of runs discussed in Mood (1940) deals only 
with the hypothesis of randomness and does not consider the distributions obtained when 
other null and alternate hypotheses may be true, while the asymptotic distributions for 
certain kinds of alternate hypotheses have been given herein. The variables , which appear 
in the proof of Mood's Theorem 6-2, p. 384, but which are not given any statistical inter- 
pretation there, are seen to have the same asymptotic distribution as that obtained, under 
the hypothesis of randomness, for the transition numbers »,, considered here; i.e. the 
asymptotic variances and covariances given by Mood in (6-11) are of the same form as 
obtained for the related statistics, where my are replaced by n, and the simplified approach 
presented herein is applied. Thus, we have seen herein that the run test based on the number 
k, of runs of observations appearing in state j is applicable as a test of the null hypothesis 
that p, = p; for all i, and the alternate hypothesis is that py = p, (unspecified) for all 
15 j and 5% p;. 

Since j is given, the null hypothesis is not the same as the hypothesis of randomness. 
The hypothesis of randomness is, in a sense, more restrictive than the null hypothesis con- 
sidered here, and the result proved here, that v, is asymptotically unit normal when p,, = Pyy 
for all i, is a stronger result than when proved under the hypothesis of randomness. p» 
this sense, the asymptotic result given herein, when the null hypothesis is true, is a stronger 
result than that obtained in the standard distribution theory of runs. If the hypothesis of 
randomness is, in fact, true, then the null hypothesis considered here will also be true, and 
the asymptotic distribution obtained for this null hypothesis will also hold in the case of 

group test for a generalization of the 


wwe shall now consider a generalization of the hypothesis discussed ee et te 
preceding section. Let us now assume that, for all j, pj; = P (unspecified). The null hypo- 


thesis to be tested is that p = 1/s. A large sample 

computing u = [Enyin Vs]s Jin- 1)]. (16) 
j 

distribution results presented in $3, we have that 


where p,, = nn. From the asymptotic 4 : * = 
w^ [Son nje + E plein — D). The asymptotic mean of wis (ep. 1 /n/V(e— D 
3 j 
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and the variance is X HHH =) = p(1—p)s*/(s—1). Under the null hype 
j 


the mean is 0, the variance is 1, and the asymptotie distribution is the unit norm 


Since 
Enjfjj-Xn,-n-XX nen- Nen x, 
j j j 1 j 


where k is the observed number of groups, we see that 


u^ (n — k—njs)s|J[n(s — 1)] = [n(s— 1)/s— n- J)]. 


It should be noticed the test based on wis not a test of randomness, since the null hypo 
is simply that p,, = 1/s for all j. In the case where s = 2, the null hypothesis is that py 
for all i, j, and the alternate hypothesis is that Pu = Pog. The hypothesis of ‘perfect’ ran 
ness (i.e. p;; = 1/s for all i, j is, in a sense, more restrictive, for s > 2, than the null hypot 
considered here, and the result proved here that u is asymptotically unit normal 
P43 = 1/8, for all j, is a stronger result than when proven under the hypothesis of ‘pe 
randomness. Thus, the test based on u is a long-sequence group test for a generalizati 
the hypothesis of ‘perfect’ randomness, which is related to the hypothesis of randon 
considered in the particular case discussed by Moore (1953). 

We have seen that the number k; of groups of observations in state j in the sequen 
approximately equal ton, Piz» where the Îij = nin, are maximum likelihocd estimal 


the first-order transition probabilities in a Markoff chain, and that the asymptotic disti 

tion theory for the Pe could be used to obtain results relating to the k;. A similar rer 

applies to the number & of observed groups in the sequence. It can also be seen th 

number ,k; of groups of observations of length i in state j can be approximated by a fum 

of the maximum likelihood estimates of the (¢+1)th order transition probabilities 

Markoff chain (see Anderson & Goodman (1957)). For example, % Y, Yn; X Mm 
jj 


Let us now assume that, for all i, n = p (unspecified) and also that, for all and j wl 
J, Pij = g (unspecified). Then p+(s—1)g = 1, and the probability of obtaining a speci 
Sequence from the Markoff chain, when the initial state is fixed, is 


I II piu E pyign-i-Xny = g-n-k =(1- p) p-*( 8— I)i-x. 


Thus, in this case, kis a sufficient statistic for p, 
p is (n—k)/(n— 1)^ (n — k)/n. The likelihood ratio 


1 
i 
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hood ratio tests. Thus, for the case considered in this paragraph, the statistic « is justified 
as a basis for a large sample test of the null hypothesis that p = 1/s. If some simple null 
hypothesis other than p = 1/s is of interest, u can be modified accordingly, and commenta 
analogous to those presented here will continue to hold true for the modified w. 

The comments in the preceding paragraph are closely related to the work of Barton & 
David (1957) (see p. 174 and Corrigenda) where they state that the use of k is equivalent to 
the likelihood ratio test of the null hypothesis that ð = © in the case where p, = (1 +0)/« 
for all j, and p,, = [1— 0/(s — 1)]/s for all ĩ and j where (4j. However, the discussion in the 
present paper has lead to the suggestion that large sample tests based on the statistic w 
are appropriate to test the null hypothesis that p = 1/« (i.e. 9 = 0), while Barton & David 
suggest implicitly a quite different statistic: viz. v* = (n—&—F,/n)/o,, where F; = X nj? 


í 
Fn- 3) F, 
dm taa i uei 
The statistic v* can be seen to be asymptotically equivalent to 
n- X nd/n- Boy = [Emm n)a- jy = — X e nin uh er lor 


and 


Thus, v* is asymptotically equivalent to a weighted sum of the v, statistics. Some justifica- 
tion for the use of large sample tests based on v, (for a given value of j) was presented earlier 
in this section for some specific null and alternate hypothesis, and this approach can also 
be used to determine the asymptotic distributions of v* under both these null and alternate ` 
hypotheses when they hold true for allj. (Barton & David (1957, p. 171) give the asymptotic 
distribution under the assumption of random arrangement on a line.) It should be men- 
tioned, however, that no justification has been presented in the present paper for the use of 
v* (i.e. this particular weighted sum of the vj) as a large sample test of any specified null and 
alternate hypotheses; and the implicit justification of its use given by Barton & David 
would apply as strongly (if not more strongly) to the statistic u as it does to the statistic . 
Their implicit justification of v* is based on their study of the conditional probability of 
obtaining a specified sequence from a Markoff chain, for a given composition of numbers 
n, of each type. Since, as we saw in the preceding paragraph, the probability of obtaining 
a specified sequence, for the particular hypotheses under consideration, depends only on 
k, n, and p (i.e. (1-- 0)/s), and not on the values of n, (i = 1, 2, ...,8), and a fortiori the likeli- 
hood ratio does not depend on the u there does not seem to be any need to study the con- 
ditional probability of obtaining a specified sequence, for given values of n If the (un- 
conditional probability of obtaining a specified sequence is studied, as was done in the 
preceding paragraph, and if the (unconditional) distribution of the likelihood ratio is used, 
then justification is found for large sample tests based on the statistic u, but not for tests 
based on the statistic v*, for the particular hypotheses under consideration. l 

8. The group tests presented in the preceding section were for the hypothesis of toward 
ness (or perfect randomness), and for generalizations of this hypothesis, againstsome 2 5 j 
kinds of alternate hypotheses. We shall now not restrict ourselves to these specific kinds o 
alternate hypotheses. " E 

Tt will bs oso to deal here with a cyclic sequence of observations, as well as with 
the sequence of observations considered earlier. Associated with every sequence € of obser- 

arding the first element of S as 


vations is a corresponding eyclic sequence € defined by reg 


immediately following its last one. We shall denote properties of S by placing a bar 
corresponding algebraic symbol relating to S. 

Let us first consider the null hypothesis of ‘perfect’ randomness (p,, = 1/3, for 
against the general alternate hypothesis that Py 1/5 for some i, j; the usual defin 
‘perfect’ randomness also assumes that the chain is stationary, but we shall not 
cerned with this here. From the results given in $3, we see that, under this null hyp 
the asymptotic distribution of 


Gili) = m. J 1/s)*/(1/s) = sie m Ial Pu 


will be a x. distribution with s — degrees of freedom, and G, = X G,(i) will have ana 
i 


totic x. distribution with s(s— 1) degrees of freedom. 2 
It can be seen that, under the null hypothesis, G, = L Gi) = SH, N (Po- 1/8) 
i EM 3 


have a y*-distribution with s(s — 1) degrees of freedom, where D» = Ti; [n;, 7i,; is them 
of direct observed e from i to j in S. and 7; N. = En, - En, is the m 
of observations in S which are in i. We have that 
61 = L -= (n; — ns? |n, 

= > ; (1/8)? (NN /s) een, 


— -= (45 —n]s* s/n, — X (n; —n]s*[n;, 


Ge E (Byns) fn Y (,—n|S? s]n = Vp- TR = Vy 
Where IG uf n, 
T 
and zi, is the number of v consecutive observations in the sequence S which are 


(11,72, . =r 


(see Good, 1955). Thus, we have shown that, under the null hypothesis, the statistic 
which was considered by Good (1955), is (for long sequences) essentially the sum 
independent X?-goodness-of-fit statistics G,(i) (i = 1,2, ), which are computed ii 
standard manner as though the Tij, for fixed i, where the observations in a sample 0 
7i; from a multinomial population, and the null hypothesis was that of an equipro 
multinomial population; i.e, Pij = l/s. A similar statement can be seen to hold tru 
VV, which is computed from S, rather than ©. 


; For long sequences, the likelihood ratio test of the null hypothesis of perfect random 
is based essentially on the statistic 


L= z Lili) = X u Nz log [,5]), 
which is asymptotically equivalent to €, ~ Vi under the null hypothesis. We have th 
L=2 E n;,logn; 2 Y ^i; logn; ＋ 2(n — 1) logs. 
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If L, is replaced by Lu we have 
L, = 3 38, log, - 2 Y log 6 taloga 
= R,~K, + 2m loge = VK, + ta logs, (m 
where R,. = 22%, loghi, for v z 1 (see Good, 1955). Using the terminology in Good (1955), 
we are testing the null hypothesis of perfect randomness, II, within the alternate hypo- 
thesis of a first-order chain, H,. Good states that the likelihood ratio test for H_, within H, 
is based on the statistic VX. VX, where he defines K, = 2in - r* I)hogin - r+ Ih 
K, ~ 2n logn, and K_, = K., = 0. This does not agree with the result presented here, and 
Good has agreed that his paper contains a number of inaccuracies; see the errata to his 
paper, where R_, is redefined as 2nlogns rather than 0. We have found that, when 
K_, = 2nlogns and K, = 2nlogn, then VK, = —2nlogs and Z, = VK. - VX, Also, for 
the non-eyelic (i.e. non-cireularized) sequence, we have found that 
Lı = K,—2Xn, logn, +n- 1) logs, (22) 
rather than K,— K, — (Ky—K_,). Thus, the result given by Good for non-cyclic remains 
incorrect, while the result for cireularized sequences has been corrected. For the cyclic 
sequence, the statistic Li can be written simply as VX. - VX. while a corresponding 
statement for Li is not correct in the case of the non-cyclic sequence, 
There seem to be some further inaccuracies concerning the non-eyelie case. Good states 
that VK,— VK, = V*K, is a special case of the likelihood ratio test for contingency tables. 
In other words, K K, — (K, — Ko) should be equal to 


2 Yn,log[Bu/f] = 2 Y lag Em leg a En logn Al 1)log(n- 1); (29) 

* ü 
this does not seem to us to be the case. However, for the cycle sequence, the statement is 
true; i.e. VK, = 2 Y i, log (py/p,). Also, Good states that V*K, is the likelihood ratio test 

ü 2 
for perfect randomness against randomness in general, In other words, 
VK, = K- K. (K. K1) 
should be equal to 
2 X n; log (Ds) = 2 E n; log n,—2nlogn+2nlogs = K. - An log n ＋ An log e; 
7 7 
this does not seem to us to be the case, unless K. Ko and K. = K, = 2 log ns. If 
K. = 2 log ns, the statement is also true for the cyclic sequence; i.e. 
ViK, = 2 Yi, log ni, — 2n log n + 2n log s. 
h * 

For the non-cyelic case, it is not possible to equate directly K, = 2Ynlogn, with either 
2 Yn, log n, or 2 Xn log n. ñ and also the correct definition of the functions K, and K, 

t 7 


must vary with the test being considered. 


It can be seen that, for long sequences, Li (and also L, = VK,- VK.) is, under the null 


hypothesis of perfect randomness, essentially the sum of s 8 oe 
which are computed in the standard manner as though the n,;, for » beerva 


from n, multinomial trials. The statistics L, and L, are asymptotically — ** 


13 
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the null hypothesis, to G,, which has an asymptotic distribution with s(s— 1) degrees of 
freedom. These statistics are also asymptotically equivalent, under the null hypothesis of 
perfect randomness, to Vg, and any of these statistics can be used to test this null hypo- 

thesis against the alternate hypothesis H,, when the observed sequence is long. 

More generally, consider now the null hypothesis of perfect randomness, H_,, within the 
alternate hypothesis H, that the chain is of the vth order. In other words, the alternate 
hypothesis is that the sequence of observations is from a Markoff chain with transition 
probability matrix defined in terms of the transition probabilities i that the variate takes 
the value j at time t, conditional on the values having been fista T, (i. e. r) at times (—y, 
t—v+l1,...,t—1, respectively (see Bartlett, 1951), and the null hypothesis is that Pry = Ve 
(for all r, j), where s is the number of states. Using an approach similar to that applied by 
Bartlett (1951) for vth order chains, it is possible to generalize the results presented earlier 
herein, Thus, under the null hypothesis, the asymptotic distribution of 


G,(r) = Nr. > (Pry —1/s)?s = Ep; (24) 


will be a r. distribution with (s — 1) degrees of freedom, and G, = y; G,(r) will have an asymp- 


totic x*-distribution with s'(s— 1) = Vs" degrees of freedom. 
It can be seen that, under the null hypothesis, C, = X G,(r) will also have an asymptotie 
* 


distribution with s(s—1) = Vert! degrees of freedom. We have that 
G, = D X (n, — 2/8)? s[n, = X X l= e (1/8)? (2, — n/s”)?] Sfr 
= x > (2, — n[s 9 s[n, — z (ri, — n|)? T. (25) 
Under the null hypothesis, 7,/n will converge in probability to 1/5", and thus 
EF Ty- n- -nepen a- Vi 60 


(see Good, 1955). Thus, under the null hypothesis, the statistic VY? is (for long sequences) 
essentially the sum of & independent X^-goodness-of-fit statistics Gr), which are computed 
in the standard manner as though the 7,;, for fixed r, were the observations from 7, multi- 


nomial trials. A similar statement can be seen to hold true for Vy? 4, which is computed 
from €, rather than €. 


For a long sequence, the likelihood ratio test is based essentially on the statistic 
L, = ELO = {25 nylog [Pys], be 

which is asymptotically equivalent to G, under the null hypothesis. We have that 
L= 2X Bites log my —2 Xn, log n, + 2(n—v) logs. (28) 


Again, we find a difference between the result presented here and that given by Good (1955). 
However, when the cyclic sequence is used, we have 


L. = È X ty; log ty Nx log i, + 2n logo 
* 
K- K, VK. = VX, i- VK. 
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It can be seen that, for long sequences, £L, (and also L. = VK. „ -V.) is, under the null 
hypothesis, essentially the sum of # independent statistics Lr], which are computed in 
the standard manner as though the ut, for fixed r. wore observations from n, multinomial 
trials, The statistics L, and L, are asymptotically equivalent, under the null hypothesis, 
to C. Ven, which has an asymptotic x*-distribution with (s 1) = Ver! degrees of 
freedom. (Thus, in the statement by Good, 1955, that ‘vy is the asymptotic form of 
Ku = VKo when H, is true’, the V should be replaced by Vy, ,; see the errata to Good's 
paper. Also, the K's should be replaced by the K's.) 

The preceding results in this section were concerned with the null hypothesis of ‘ perfect’ 
randomness, H_,. Now let us consider the null hypothesis of randomness (i.e. independence 
of successive observations) Hy, against the alternate hypothesis H,. Under the null hypo- 
thesis that p,; = p, (unspecified) for all i, the asymptotic distribution of the statistic 


EK. Em. E (Pu— PIP, = EZ(ny- mn sn, Pn. jm.) (29) 


(where n. = n— 1), which is similar to the statistic used as a test of homogeneity of pro- 
portions in s independent samples or a test of independence in an s x s contingency table, 
will be a 3?-distribution with (s— 1)* degrees of freedom (see Anderson & Goodman, 1957). 
Also, the likelihood ratio statistic, 


Mg = 2 E nylog (Pyl, = 25 nylogny— 3m logn, 
2 Eu. log n. An - 1) log (- 1), (30) 
i 


is asymptotically equivalent to F}, under the null hypothesis. Using the cyclic sequence, 
F,,, is asymptotically equivalent, under the null hypothesis, to F, , and also to M, o, which 
is equal to 
2 Tn log -A Nn, log, + 2nlogn = K,- K,- (K.K.) = VIE, 
ü j 


The articles referred to herein, which discuss tests fhe hug hypothesis s of porco. 
assumed (either explicitly or implicitly) that all probabilities p; were positive; we 
likewise ü the precede Sepa: Also, when indicating the number of degrees of freedom 
for some statistics (which were asymptotically x°) relating to tests for certain null hypotheses 
concerning Markoff chains, these articles assumed (usually implicitly) that all the — 
tion probabilities in the Markoff chain were positive. For the sake of simplicity, we sha 
do likewise here when indicating the size of certain contingency tables (and thus the M. 
of degrees of freedom for the x? statistics corresponding to these tables). If ww E * 
probabilities are zero, then the methods developed in the present paper can 15 : : 
ina straightforward manner to obtain analogous results (see, for example, X ett ( ids 
Consider now the null hypothesis H, within the alternate . H, i e 
"i, ne) be the number of consecutive observations in the sequence of lengt 


(ri To, orti) = (r(1), r(2)) =T, 
where r(1) = (rufa Toy) and r(2) ER (rii LT uv) 
(when s = 0, r(1) = rand the symbol r(2) can be neglected). The alternate hypothesis is 


that the sequence of observations is from a Markoff chain with probability transition matrix 


sge iti = that the variate takes the 
defined in terms of the transition probabilities Pr; = Pe us 
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value j at time t, conditional on the values having been 71, 75, ...,7,_ Ai T, at times 
t—»,t-v-41,...,t-u—l,t- p, ...,t—1, respectively, and the null hypothesis H, is that 


Paci nao = Proz (unspecified) for all r(1). (31) 
Let Ned, 12), = Umea xy, n., (32) 
b ndj = Y Neds), 12, j (33) 
r(1) 
and N. 1. = UM ra = 2 Maw, so. (34) 


Then, under the null hypothesis H,, the asymptotic distribution of the statistic 
F, ,{t(2)] => Tc), x2). X [Po «9; — Deo; TID; 


- P E "eco, s); — eco, x2, . s/n. sy, E/ ln cp, (2.21051. s], — (35) 


which is similar to the statistic used as a test of homogeneity of proportions in s’-# in- 
dependent samples or a test of independence in an s’-"x s contingency table, will be a 
x?-distribution with (%- 1) (s—1) degrees of freedom (see Anderson & Goodman, 1957). 
Also, the statistics I, ,[r(2)], for different sets of r(2), will be asymptotically independent, 
and the asymptotic distribution of the sum of these s“ statistics, F, , = 2 F, ,[r(2)], is a 
TI 
X*-distribution with s«(s'-4— 1) (s — 1) = (s$'—s^) (s— 1) = Vor — Vot degrees of freedom. 
The likelihood ratio statistic, 


=2 l 
M. P à Naw, 2); log LP, ) / Drei 


= 22d ny log n, 2 L nr log n. 2 N X Nas log n. / ＋ 2 Y, N. r. Iog n. . (36) 
1j r 102) 7 r(2) 


is asymptotically equivalent to £,, under the null hypothesis. Using the cyclic sequence, 


F,, is asymptotically equivalent, under the null hypothesis, to In, „ and also to M, „ which 
is equal to 


22 Yn, log y — 227, log 7, —2 Y so; log nga, 2 x Nya) log Ry) 
Li 102 


— E, -K,- (K- K,) m VK,, * VE ar (87) 


It can be seen that, for long sequences, Jf, , = VK VK „a = L,— L, is, under the null 
hypothesis, essentially the sum of s^ independent statistics I., ,Ir(2)], which are computed 
in the standard manner for testing the homogeneity of proportions as though the 7,97 


were observations in s"-7 independent samples of size Tie, soy (r(2) is fixed) from multi- 
nomial populations. : 


We have that 
e ^ n n D X 
Vi, = bp P X Dv, cen — Ty, coy[s]* / [Pans ll EEG elfe; (38) 
When H 


3 is true, then n converges in probability to 1/s, and 
F. d. — G, V- vy? 


PH 
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From this discussion, we see that the strong analogy, which is mentioned by Good (1955), 
between the expressions VR. - VX. and Vy3,, V seems to be related to the fact 
that, if the null hypothesis H, is true, there is a strong analogy between F, p» M, , F, , and 
M, , = VK,,, VK „+1; and when H , is true, these expressions are asymptotically equi- 
valent to G, — G, Vila - Way. When Hi is not true, the present writer does not see 
a strong analogy between the expressions VK,,, — VK p and Vii, - Vy... 
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MOMENT GENERATING FUNCTIONS OF QUADRATIC FORMS 
IN SERIALLY CORRELATED NORMAL VARIABLES 


By ROY B. LEIPNIK 
Mathematics Department, University of Washington, Seattle 


1. SUMMARY 


A method used by Kac in the study of Wiener functionals is adapted to the problem of 
calculating in closed form the joint moment generating functions of linear combinations of 
quadratic forms (not simultaneously diagonable) in serially correlated normal variables, 
A class of Gaussian processes is found for which this method is successful. 

The results are worked out in detail for the special case of the Uhlenbeck—Ornstein process, 
which includes first order stochastic difference equations with constant coefficient p. 
Exact moments of several estimates of the variance and autocorrelation are studied. 
Asymptotic results as the number n+ 1 of observations —co and the interval / between 
observations +0 are derived under various assumptions on the limit of 7’ = nh. 


2. INTRODUCTION 
Testing for serial correlation depends fundamentally, as pointed out by Koopmans (1942), 
on the joint distribution of quadratic forms which cannot be simultaneously diagonalized. 
The resulting complications have led investigators to introduce various simplifications and 
approximations in order to cope with this distribution problem. These are 

(a) the circular problem (R. L. Anderson, 1942; Leipnik, 1947; Quenouille, 1949); 

(6) test for zero correlation (Von Neumann, 1941; Koopmans, 1942; Quenouille, 1949; 
T. W. Anderson, 1948); 

(c) regression on selected observations (Ogawara, 1951); 

(d) omission of selected observations (Durbin & Watson, 1951); 

(e) approximate distributions (R. L. Anderson, 1942; Koopmans, 1942; Rubin, 1947; 
Leipnik, 1947; Daniels, 1956; Jenkins, 1956). 

Daniels (1956) has recently obtained the exact characteristic function of several quotients 
of quadratic forms in normal variables by the use of a difference equation technique for 
caleulating determinants. This method is well adapted to finding asymptotic expansions 
for the distributions. His results are closely related to some of ours. 

A general theory of estimation in stochastic processes has been constructed by Grenander 
& Rosenblatt (1952, 1953, 1954) in a series of papers, and they have derived important 
asymptotie results on the moments of quadratie forms used in estimating the spectral 
density of a stationary time series. Parzen (in an unpublished paper) has generalized their 
approach to stationary continuous processes. 

For a special class of processes, Rubin and Savage (see Rubin, 1947) have proved that 
certain quadratic statistics converge in probability to process parameters as n> co and A0. 

None of these results, however, answers the question of the joint distribution.of the 
quadratic forms in the unmodified process. 

Kac (1946) has succeeded in calculating the moment generating functions of some very 
interesting Wiener functionals by using the theory of eigen solutions of symmetric integral 
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operators. In this paper, we adapt the method of Kae to cur problem. A grarrakeation of 
the Kac method, which may have an independent interest, is inebeded. We thee obtain an 
explicit closed form for the joint moment generating fanction of sear combinations of four 
quadratic forma. These statistics inelude those commonly wed in the estimation of serial 
parameters. Moments are, of course, then obtainable. We work out these results in detail 
in the case of the Uhlenbeck-Oratein (1930) process, and write down the first two momenta. 
Finally, we derive some asymptotic resulta on the behaviour of the estimates as a -> o, À 0. 

The distributions of quotients can be also obtained from the above by use of the inversion 
formulas of Gurland and others. The extremely complicated form of the reealt 2 . 
that the methods of Daniels (1956) and Jenkins (1956) which lead quickly to aeymptotie 
expansions are better suited to this application. However, our axymptotic resulta indicate 
that the squared successive difference estimator of the coefficient p due to Von Neumann 
is in certain respects superior to the quotient type of estimator. 


3. A CLASS OF SERIALLY CORRELATED FROCHENSSES 


The method of Kae can be successfully applied to an interesting clas of processes, which 
includes some non-Markov and non-stationary processes, A process XII) will be called 
serially stationary in case 

(a) m(t) = BLX(®] = 0, 

(^) there exist functions a(h) and e(A) such that E((X(f +A) —a(&) X(0)] = «(&) for all 
t and all A = 0, 

(c) the linear combinations 

Xi alt.. XU. — X) - ait, bya) Xt.) 

are uncorrelated whenever f, «1, «€ ... < bas 

Let r(t, ta) = E(X(t) X(tq)] be the autocovariance function of the process X(t). Analytic- 
ally, three distinct possibilities exist for a(h), v(A) and r(t,, ta) for serially stationary processes. 
These are 

where At- is arbitrary. 
(I) a(h) = 1, v(h) = o%h, r(ty ta) = a* min (ty) + Alh) + Ath), 
(11) a(h) = p, w(h) -a, rl laste) oth ph (6) A), where p+0, 1, 

and A(t) is arbitrary. ai-ti 

(LIT) a = 0, v(h) = 04, rlbb) = 78h — ms. Bh 

(I) is essentially the Wiener process which represents classical jr m Rs om. Ts 
non-stationary, and it is Markov if and only if, Alt) = const. or const. p", and Markov 
process of type (II) is stationary if 3 * R Hiss type met frequently 
if and only if A(t) = const. or A(t) = const. — . * process. The type (III) 
in applications; when Gaussian, it is called the Uhlen e eee 
process is stationary and uncorrelated, and isoften eer on 
we do not consider it rte e E v(h), and r(t, ti) as above 

There is no difficulty in showing that the proves However, deduction of expressions 
are in fact serially stationary. The et ee E is more difficult and involves 
of the above three types Zomme MARET T taste finfint apace here. 
considerable manipulation of NN processes is that for which there 

The most interesting acris n serially 
exists an & = a(p, tı tn+1) such ! T: —t,) X(t) forj = 1,2, , n. 

(d) X(t,) -aX(t,.;) is uncorrelated with X(t) — (1 — 4) X( h forj 


200 Moment generating functions of quadratic forms 


The values of æ and the corresponding differentiable functions A(t) determined by the 
additional requirement (d) are 


A() = Sa, l. type I, 


Kop 
1-k 


A(t) = +A, a = kpin, type II. 


We distinguish the non-circular case, where & = 0, A(t) = A = const. and the circular 
case, where a 4- 0. 


This terminology is chosen because the non-circular case is closely related to the usual 


non-circular time series, and the circular case is a generalization of the circular time series 
introduced by Hotelling to simplify the distribution problem. The non-circular cases of 
both types are Markov, and the circular cases are non-Markov. Type I processes in either 
case are non-stationary, and type II processes in either case are stationary. 


4. JOINT DISTRIBUTION OF THE PROCESS 


We assume henceforth that X(t) is a Gaussian, non-circular, serially stationary process, 
(In a later paper, we hope to consider the general circular case, which is not in all respects 
easier than the non-circular.) Let us now obtain the joint distribution of X (ti) ..., X (tns) 


for t «t « ... «t, 4. 


Tt is convenient to define 


v(t) = AIX = | 


By our basic assumptions, we have 


n E 
PEXU) Syo X(t) allah) Xi) <n] = 7) (ntt) fI va -)) - O 


Yi Uni ur n [M 
tee exp — — du, ...d à 
a i 00 n | 20, (ty) R> 250 aI m pen. 


The matrix of the transformation from 


ot,+2A, type I. 
0 ＋ 24h, type II. 


X(t), Xia) — alta ti) X(t), ..., X(. i) — (tn sa — tn) X (tp) 


to X(t), X( te), Seep X (tn 41) 
has zeros above the main diagonal, and ones on the diagonal. Its determinant, the Jacobian 
of the transformation, is therefore equal to 1. Hence, the joint distribution I. ..,, ma of 
X(t), ..., X(. i) is 
n + 
F, sestig (ti — Ung) = (27) -H (at) TI v(644 — n) . (2) 
kel 
41 Tn 
f ^ MGi 55, ...,8,41) ds, . ds, . 
-0 -0 
2 n 2 
where M (84, 89, , 8, 1) = ex [- si L S a Utes — ty) Se) - (3) 
xs AS E 2v(t) pai 2w(0,,— ty) 


We say that the process is sampled at a constant rate ift, t, = h> 0fork = 1,2, 
We then have t, = t, +kT'/n, where T = nh is the length of time over which the process 18 
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observed. Itis customary and convenient to assume a constant sampling rate, and we make 
this simplification henceforth. The normalizing constant in (2) then becomes 


2) - Ven Ay) 
and the integrand M becomes acd Weir] 
1 ath) (i-a A) a a) A | 
M (5, ...,8444) = --[L—4—— "à 1 ! 
tei epf- a) 0 22009 a) 00 teensy (0 


Note that M depends on the four quadratic forms og. $ DO PM 5 8,5144. The usual treat- 
ment of quadratic forms in mathematical statistics depends on Pan CRR diagonability. 
It is easy to show that the only linear combinations as? +b È Ares simultaneous diagon- 
able with Serena are those for which a = b = c. Referrring to (4), this condition becomes 


in the mee situation a(h) = 0, v,(t,) = v(h). We see that the exponent in M can be diagon- 
alized when and only when p = 0, A = 0. This possibility has already been exploited by 
Koopmans (1942) and Von Neumann (1941) in testing the hypothesis p = 0. The motive 
for the circular case is also, of course, simultaneous diagonability. Fortunately, the Kac 
method enables us to dispense with this. 


5. MOMENT-GENERATING FUNCTIONS 


The statistics proposed for the estimation of p and o* in the non-circular case (except for 
those of Durbin & Watson, 1951) have been functions of the quadratic forms 


P =X) A-EXW B- KU, C= EWN 0 
We see from the form of the integrand M that these are the most natural. 
us 2Sa Len for (i=1,2,...) (8) 
where the ei are arbitrary real feted We wish to calculate the joint moment generating 
AX (04, 0, .) = E[exo( - 0,2) = r. 0). 


where Z= hien = 1, 2,3) and. W = hien (7) 
Clearly ꝙ is given by 

P(O ba, ...) = 72 5 at Mile, , 8, 41) dei . der. (8) 

ule = (21) Aero (v) 07) 3, (9) 


n n 
82 — 2382 ^ 1 
My (8), ...,8543) = M len «+ 5541) exp(-ast-a 2: ke 23 „Sa- (10) 
: izing M;(5,, ...,5,,4) into 
ibili i the method of Kac rests on factorizing MiG ai 
The possibility of applying XL. ie A a 


the form K (s, 8, 44) K1(81 82) Ko(82 83) --- 
done with Kc y C Kin) = Kiya) (= Ios n) and Kole, y) = J(e) Lly), where 


J(z) = exp {- 623) 
K (r, 7) = exp -c dey — 633]; (11) 


L(y) = exp {— 634°} 
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The coefficients ci, en, cn d are given in terms of a(h), v(h), v,(t), z,, Za zy, w by 
2, 1 a*(À) — 1 
aTa Et 4e) ' 
z, 48050 ＋ 1 
G3taw) * 
22 , 1—a?(h) 
ama- e 
a(h) 
d= ~ wt „J 
Thus we have My (8), ., 8n41) = J (8) K (8), 82) .. K (, 6, 1) LGS ui). 
where J, K, L are defined by (11), (12). 


6. THE METHOD OF KAC AND A GENERALIZATION 

Kae (1946) and Kac & Erdos (1946) devised a very ingenious method for calculating mu 

integrals, which he used to great effect in the theory of Wiener functionals. Because o 

frequent occurrence of intractable multiple integrals in mathematical statistics, it mi 

of interest to generalize the Kac method by separating out its essence* from the partic 
device that Kac used in simplifying the calculations. 

Let A(x, ...,2,,,) be the integrand of a multiple integral 


Pe: I y f E E . T 41) 
which it is desired to calculate. If A can be factorized as a product 
All , 4) = A(x)... Anat) 1 

the problem is simplified in a familiar fashion. Suppose, however, that A can be facto 

as a product of functions of pairs as follows: 
A(z, sees n41) = Kolti tna) IT Ky, 2544). 
Under the further conditions 

Kz, y) = J (x), K,(x,y) = K(x, y) — Ky, x) for j 5 1, *, 
Kac showed how the integral I could be calculated. The essence of the method is to exp 
Ko, K,, ..., K, into bi-orthonormal series of functions {¢,}, and then interchange inte 


and summation. The further simplification effected by Kac comes from choosing 


functions (/) as the orthonormal eigenfunctions of the symmetric kernel K, and exp 
J in orthonormal series. 


Let ſc be a complete set of orthonormal functions, so that 


„ k= , 
Jester o cou = a, = h E 


and every square integrable function g possesses an orthonormal expansion 


= (foc dul) olda) daly) 


converging in mean square to gly). 


* The author is indebted to the referee for pointing out the possibility of such a separation. 
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We formally write K. 
where c, = |] Ke Aled a ode) oy for j=0,1,....": wt 1,2,.... 
Let CY be the matrix [cof bi-orthonormal coefficients of Kp and let C, = CC, 0w, 
If summation and integration can be interchanged, we have 
H, (x, Snes) m K. C., ., ed, e (dz,) 
L. SS.. e | 


Me Mate 


* 2 [ON ETUR] 2 . . 
DS LS 


-X $40) $e) (Cs e 
N, e. 


Hence we find i5 Í f K. (en zs) Holy 2. 1) (dg) (ds...) 
= F. Mcd dd. - ECC, 
= tr (0 C) = tr(C9 . 


The above formula constitutes our generalization of the Kac method. 
We now return to the Kac method proper. If Kaz, y)  K(z, y) = K(y,2) +0, then by 


Hilbert-Schmidt theory the symmetric integral operator T(/)(z) = [xe H 
possesses anon-empty set of eigenfunctions (,) and eigen values (A,], for which Te = Ay Øp- 
The functi 
D A I à Pie) Ker, 2,11) (dns)... (der) 
is none other than the iterated kernel K(z, z,,,), Which for n> 1 possesses the well- 
behaved expansion Kon, 241) = I My) balan). 
If Kor, y) = J(x) L(y), and the eigenfunctions {¢,} are complete, we write 
J(x)~ Ere Pal), Ly)~ E vy Pel), 


where p= f I(x) glz) dr), v, - {uw Pxly) (dy). 
Since I= Í f Ji) L(x, 4) K) Si . 
we have finally I= E MAR ys = Enn. 
id 
Kac considered the case L(y) = 1, vy = jer. For our calculations, we need the 


ight inerease ity indi from the expression (13). 
slight ine in generality indicated above, as can be seen fr 

ItK pi 8 R we use the bi- orthonormal expansion K, a(t, y) jin Wegen n. 
Then J = SOAR = tr (CMA), where A is the diagonal matrix of eigenvalues. 


formula is ot use in calculating the moment generating function in the circular case. 
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7. APPLICATION OF THE KAC METHOD 
We now apply the above considerations to the calculations of O. From (11) we 
symmetric kernel 
K (x,y) = exp (—¢,2* + dry —cy*) 


for which we wish to determine the corresponding eigenvalues and eigenvectors. Us 
normalization of the Hermite functions found in (10), we have from the Mehler 
generating function that 


1— 


exp | - "e exp LH [u*(a? +y?) — 2uzy)} -(1- vL 712 h(rx) iN 


On comparing the left-hand side of (15) with (14), we find that a suitable ch 
and u gives a bi- orthogonal expansion for K (x,y). This choice is determined by the 


Cri aise 2ur? — 
3 qc. UR T. u? + 
y : 2c,—1? 
The solution we seek is r= (Ac - det, u= i 


This is satisfactory in case r+ 0, |u| <1, and w+ +1. From the known properti 
Hermite functions we find 
[Keniry dy = rera aen, 
so that the eigenvalues are Àp = Tr 1uP(1 — u2) 
and the normalized eigenfunctions are 
Vylr) = 7-12-(p!)- rth, (ra). 


The Hermite functions are known to be complete in Z;( —oo, oo), so that // is com 
for r4- 0. 


We now calculate the coefficients 


Mp = | * Je) Wyler) de, vp = | 00 Vy) dy 


in the expansions of J and L of (11). 


The labour involved is small, since J and L have the same simple exponential. 
Multiplication of the generati g function of the Hermite polynomials by exp( 
termwise integration, and coefficient matching yields 


T 7 (p[y — 1) 
f e H (rx) dæ = f vp! ()) - (r?/v— 1i», fl 


0 , p odd, 
and therefore 


—2q\ i» 
[numis = [riri eom as ("^ m 
E 0 , podd. 
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From (11) and (21) we find 
W p even, 
0 


Ay’, = , pedd, (24) 
here de- 
p 17 (P536) (3-36) (25) 
A= [(r* e) rt e). 
The series S for our integral can now be summed. From (24) and (19) we have 
— 
Y * * — ^ M (2m)! 
Xp yy Ap = eri ay (mii dmm. (26) 
The binomial expansion of (1— Z2)-3 is 
- A= $ (2m)! jim 2 
(1-2?) , (mi)? (42) (27) 
Comparing (26) and (27) we have from (8) and (9) that 
D(A, Oa, ...) = ZAD v (t) (v(h)) In -M- wea), (28) 


where a and # are expressed in terms of e en, and r by (25), w and r in terms of c, and d by 
(17), cy, Ca, cs, d in terms of z,, Ze, 23, w, a(h), v(h), v(t) by (12), and Zy 23 23, w in terms of 
0; and e; by (7). Note that ©(4,, 6,, ...) is an algebraic function of 6,,0,,... and also of 
a(h), v(h), v,(t), and the eg. 


8. Tug UHLENBECK-ÜRNSTEIN PROCESS 
The only stationary Markov members of the set of serially stationary processes are those 
for which r(t ta) = -, alh) = p^, 
v(h) = o%(1—p™), vilti) = o°. 
Gaussian processes of this type were introduced by the physicists Uhlenbeck & Ornstein 
(1930) as models for Brownian motion in a gas. Since the mean is taken as zero, the para- 


meters are p and o. x 

Many Ris have been written on the estimation of the discrete time series U, A. 
generated by the first-order difference equation U,,, — YUk =, where Us ViVa are 
independently and normally distributed, and E[V1] 2 (k = 1,2, ...). Ee 
be obtained by sampling the U — O process as follows: choose A» 0, t, ie i dp "a 
Ua = X(t, 4%) (k = 0,1,...), y = p^ HUI. = 0°, 01 = 0*(1—»*). To - y : , = 
of changing the sampling rate on any physical or economic men or whic! = 
above difference equation is a model, it is necessary to embed the discre! 3 = e 
U —0 process. Similarly, the effect of changing the ping us on a =a en : ^ 
Process can be analysed by embedding it in a continuous ‘circular’ process 
end of $4. , 1 

The ET for y and c? have been quadratic forms in U, [4 - ii sed of m. 
The distributions of such statistics are extremely complicated. T SEES ^ r 
a summary of the methods employed to obtain exact and approxima’ 


Such statistics. 
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As an application of our methods, we will write down the joint moment generating fune- 
tion of the statistics suitable for estimating p and , y and o, or y and c?. 
The principal statistic proposed for g? has been, in the notation of § 5, 


BEIM es ee east bat Le 
pierces Vea «) = n+l 
We shall find that the statistics 
I, 
= zou where J, = ¢(P,+P,)+P, 
particularly S, = (1/n) (&(P +P) +P, 


have advantages over S, as estimates of G2. Note that HS,] = o? for each e, so that all the 
5, are unbiased. 

Many estimates for y — p^ have been proposed. The ordinary correlation coefficient 
between X(t), ..., X(t,) and X (tə), , X(, Ai), which can be written as Q[(P, NN. 
is very awkward. The serial correlation coefficient, defined as 


b X(t) X(tj,,) 


e 


ntl 
M X*(t) 
j-1 


Jt, nao u, Sn) = (270?) Yee (1 — phym 


xexp| - etes ( hn) 42% Shays]. (09 
20% p?^) 1 n+ jag” Frail 


Thus, the three quadratic forms P, P, Py, Q are the only characteristics of the sample 
that enter the joint distribution, and form a set of sufficient statistics for the estimation of 
p and c?, as Koopmans (1942) pointed out. 

We find that the maximum likelihood estimates R and S of p^ and c? satisfy the equations 


nS(1— R?) = 55 (30) 
(n+1)S(1— R?) = P, +P, + (1+ R?) P, 2RQ. 

We find S = Q/R- P, on elimination, and substitution yields a cubic equation for R 
with coefficients depending on n, P,+P,, P, Q. If S(I- R?) is negligible compared to 
nS(1 — R?), we obtain the approximation R~ /H. 

Another type of estimate, due to Von Neumann, has many desirable properties. Let 


N = È (Xa) - Xt)? -O. K 
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p^. Since N is a quadratic form instead of a quotient, the calculation burdens are less than 


E[N|T] = 20*(1— p^)/A, lim [N/T] = —2o* logp. 
Savage proved the strong results that p.lim N/T =~ 2ologp, and that if Í Taxur 
o 


is defined as the limit in measure of N as À—0, then ſ ar 2% logy with prob- 
0 


ability 1. Rubin (1947) extended this by sampling with a variable rate, weakening the 
Gaussian character, and generalizing the covariance function of the process. We will 
find that Iz, N, Q have the simplest moment generating functions among the quadratic 
statistics for the U — O process. 

Moreover, the intra-class correlation E = 1—4N/J, is an estimate of p^ for 
unknown g? with a simpler distribution than Q/J, or Q/1,. J 

We now specialize the fundamental formula (28) to obtain the joint moment generating 
functions of I, = e(P,+P,)+P, and Q. 


We have 001, 0,) = (20%)-4™+0 1 —p™)-4n ring- — 12015 (1 -u -A, (32) 
6 1 
where [2] =Q = -e us am i cat ay 
= gg E 
a= epa: (33) 


[pott ene tt 
NE ee x 8725 f H= een. 


dc e 


The advantage of taking e = is apparent. Note that (@,,4,) can be extended into the 
complex domain by analytic continuation. | ` ; 
The moment generating functions Fi, F, V, of I, N, Q are easily expressed in terms o: 


® by 
Ji) = (0,0), 
W,(0) = 00,0). 

The means of J,, N, Q are known already 

EU] = (n+2e—1)0%, EIS, = , 

E[N] = 2ne*(1 . (35) 

E[Q] = no*p^. 
The variance of J, is given by 


var [1] = alog Vs loco (36) 


and the variances of N and Q are given by similar formulas. 
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After a very long and obnoxious calculation we find 
(arg 0 p”) - (1 99) («y 
+ 4e, (1 — p?n^) (1 — pt^) + 4ej(1 + (1p, (37) 


where €, = €— 3. Since S, = L/(n + 2e,), we have var [S,] = var [J,]/(n + 2¢,)*. 
In particular we have for e = 0, 4, 1 the variances 


var [I,] = 


var lS) = qe, 
morc cuc) (88) 
Similar calculations for N and Q give 
Mole m sme DG +0% + 4h) + ( — 1)], (39) 
He me) a 


The distribution of Q/7, can be calculated from the joint moment generating functions 
€0(0,, 02) of I, and Q by the inversion formula of Gurland (1948) by which 


v WS (H d(ita, — it) 
H(a*)4- H(a-) = 1——. li Uc. 4l 
quien i-e ema, 4 


where H(a) = Pr[Q/I. <a]. 
An explicit expression can be better obtained by other methods (see Daniels, 1956). 

The variance results (37), (39), (40) yield anumber of interesting consequences, summarized 
in the appended table. Seven limiting situations are considered for each of the five statistics 
N/T, S, = L|n, Qn, OI, N|(TS,). In the upper half of the double row devoted to each of 
the first three statistics is found the limiting variance, and in the lower half the constant to 
which the statistic tends in probability, if such exists. If the limiting variance is positive 
and finite, we may plausibly suppose that a limiting distribution exists. For each of the 
last two statistics the single row contains such conclusions as can be inferred from the 
behaviour of the first three. 

If o? is known, N/T is a much better estimate of p (or rather, of — 20 log p) than Q/n is 
of c?p^ or O /I. of p^, since it is consistent under a much wider variety of conditions. If o? 
is unknown, the estimates N KTS.) = (n+ 2c — 1) NV (i] appear to be better estimates of 
— 21og p than /I. is of p^. Of these, N (A) =N /(hI,) would appear to have the simplest 
distribution. Von Neumann (1941) himself chose to investigate the distribution of M/ r for 
the discrete non-circular time series with p = 0, for testing the hypothesis of uncorrelation. 

From the fundamental formula (28), we can write the joint moment generating functions 
®, of 1, = e(P,+P,)+ P, and N = P,+P,+2P,—2Q as 


9,(0,, 6) = (202)-Hm+9 (1 — p2h)-4n 1-211. 42/4 (1 — aun), (en 
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where a and 7 are expressed in terms of ei. c, and r by (25), u, and r in terms of e, d by (17), 


and 


E- 


6, 
01 2 ++ pn (42) 


E 

d = 20,4 14 ue A 

1 et1-99) 
The greater simplicity obtained by choosing e = } is again apparent. Applying the Gur- 
land inversion formula as in (41) to 6,(6,, 6) yields the distribution of the quotient NIL, 


for arbitrary e, p. 
Table 1 


Limit Teil, 
condition p fixed em p fixed p fixed | o- 
h fixed hos Gia * 9 n fixed 
x — n> 
n ch T fixed n fixed | A fixed 
| 
. dot 
Variance 0 0 0 0 —, (8n-1 
* l l m ) 
p. lin. — 2% log p - 20° log p — 26! log p —20* log p lim. dist. | 
Variance | 0 0 c^ na 0 8 1 | 
8, Tlogp (T log p) (n+ 26) 
p. lim. gt g? lim, dist. g lim, dist. 
A 2 1—p* gt 
Variance 0 0 ÜTG — 
"| d Tlogp & log 9) 7 
p. lim. o*ph g? lim, dist. lim. dist. 
. p. lim, p 1 lim. dist. lim. dist. 
Mrs, p. lim. —21ogp | —2logp lim. dist. 2287 lim. dist. 


The question of which S., for e in [0, 1], gives the best estimate of g? is also of interest. 
On grounds of theoretical simplicity, e = } is indicated. Numerical computation of var IS. I. 
by means of (37) et sed. has been carried out for € = 0, 0-1, 0-2, ..., 0-9, 1, n = 3,4, 5, ..., 10, 
15, 20,30, ...,100, p = 0, 0-01, , 0-05, 0-10, ..., 0:90, 0-95, , 0-99, A = 0-001, 0-01, 0-1, 1. 
The general conclusion is that y, = var[S,]/* tends to its limit 2 very rapidly, nearly 
independent of E, p, h. More precisely, for 0 < p < 0:95, alle, all}, n > 3, we have | y.— 2|<0-04 
and for n>10, |y,—2|<0-01. However, for 0:95<p<0-99, y, has a flat minimum for 
0-1 « « 0-9, rising about 40% for O Se 0-1 and 0-9«e« 1, all h, and 3&n«10. In the 
same p, h range, y, is practically constant for n> 20. Thus the choice of e cannot be made on 


numerical grounds either when 0 € p < 0:95, or when n> 20, in the range of h considered. 


For 0-95 < p < 0-99 and n< 20, y, increases considerably for e near 0 or 1, which seems to 
are best for non-circular correlation. 


support the contention that € = } (intra-class statistics) 
checking (37) and to the staff of the 701 Com- 


mart for 
Thanks are expressed to Mr Rod 8 China Lake, California, for computational 


puter, U.S. Naval Ordnance Test Station, 
assistance, 
14 
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MOMENTS OF SAMPLE MOMENTS OF CENSORED SAMPLES 
FROM A NORMAL POPULATION 


Bv J. G. SAW 
University College London 


I. INTRODUCTION 


Given a sample , ys, ...,Y, of size n from a normal population, it is sometimes found that 
only the smallest r (or largest r) of these are available for statistical tests concerning the 
parameters of the population. In the case where the parent population is normal, mean y, 
variance g? for example, and we are given only the r smallest observations, we may be 
required to differentiate between a ‘null’ and an ‘alternative’ hypothesis concerning one 
of the parameters, the other being either numerically specified or unspecified. 

Should the whole sample of n be available, the appropriate maximum likelihood test 
function is well known in each case and is seen to be a simple function of the first two 
k-statisties of the sample. 

Although the test function appropriate to the statistical tests described above are con- 
siderably more complex when a censored sample only is available, it is found that these 
too may be written as a function of the first two k-statistics together with the rth smallest 
ordered variable. In order to find moments of these test functions we shall first require 
moments of the k-statistics and the work which follows gives a method of finding the latter 
as an expansion in powers of u. . j 

The appropriate test functions for the four main single-sample tests which may arise 
when dealing with a normal population are being studied with regard to their distribution 
and power. For example, the simplest case which may occur is where we require to test for 
the location of a normal population, the standard deviation being known. In this instance, 
the distribution of a certain linear sum of censored mean and rth ordered variable is very 
close to normality, while the asymptotic relative efficiency is 81:83 9% when 50% of the 
sample is available and 95-63% when 80% of the sample is available. It is intended to 
give a full account of the test functions for location, standard deviation being either known 
or unknown and for dispersion when the mean is either known or unknown, in a later paper. 

David & Johnson (1954) give formulae which approximate to the moments and product- 
moments of ordered variates, but state that these may not be suitable for extreme values 
(which are of course contained in the k-statistics of a singly censored sample). We shall 
show in the first part of this paper, that our problem may be reduced to a simple extension 
of theirs, ‘ 

$4 deals algebraically with the approximation to a set of ndm [s a gon gor 
a power series in (n + 2)-!, the integrals being fundamental to the solution of the p ges ^ 
Precise algebraic expressions are given which occur under normal theory. In$5 are tables 

pur iy 1 4-2) occurring in this power series for certain 
of the coefficients of (n + A ot pA ps tio r/(n+ 1). In $6 some comparisons are 
of the set of integrals and various values of the ratio r/ 3 Pi 3 aad 
made between estimates provided by our series, Eo ˖ term in (n48) 5 


= sk = M 6. 
exact values given by Teichroew (1956) for n — 19; EI 
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Finally, it should be mentioned that the methods described in this paper may be used 
for distributions other than normal and for doubly censored samples. 


2. NOTATION 


We take y4, Ya , u, aS a random sample of n from a unit normal population, mean zero, 
x, will be used to denote the rth smallest of the /s; , 25, ..., 2 4, will be used to denote the 
r— 1 y's smaller than a, and are subject only to the condition 


«eT. (= 1,2, .2.5;7—1). 


being randomly ordered amongst themselves. 
The k-statistics, k, and kg, are defined in the usual way, viz. 


rate t, M 1 r-1 a MA 
Ae, anaa- 6x] 


iit 7 
We shall write 


1 * 
Ue) erb-, Fle) = | Ae du, p,=r|(n+1). | 
For reasons, which will become clear later, we define a function | 


: "ned ers s 5 Z) 
Vp, : a,b) = N cv 1(2) [1 — F(x)]” 15 
which will usually be abbreviated to / when the values of r and n are clear from the — 
context. 

We notice that 


a 
| bd, 


PG ar) AF, —o<aiea, (i = 1. 2, , v1), 


Z 
PO) = Nfg n EERE (Ca, +00). 


3. RELATIONSHIP BETWEEN MOMENTS AND THE INTEGRALS / pe, N: a,b) 


Moments about zero of the k-statistics may be obtained as a finite linear sum of integrals 
of the type v n: a,b) which we shall demonstrate by application to the first moment 
of ka. Writing k, in the form 


iret. 1 22. t-1 ) 
2 2 meri y * T , 341 
Ie r (= ere 121 D x Em 2^ à 5 
since p(z; r) = / Ha,) and Xi 2; are independent, then 

1 Z Z(æ y]? Z(v,) : 

E(k, rf uli- re] air- Zr o, "e 32) 
scat (ime) hamid | Wael fs a | 

bs. Ze " 

But it will be seen that é Fel = ya (3) 


whence E(k) = 1 —1) 0 (+3) Vn V - (r- 2) Yao}. ed | 


| 
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The value of oo is of course unity but the symbol has been left in the formula (3:4) in 
view of equation (3-8) which gives a relation between the product-moment & Ei ag] and the 
integrals vr. 

Proceeding according to this example, we obtain the following relations for the first four 
moments of k, and the first two of k,: 


é(k)r = — (r—-1) Vas Va, 
6 (I) r? = (r— 1) Yoo + (r— 1) (Yoo — 9 en) + Vos 
E(k) r? = — (r— 1 Ves (r— 1)? (6/53 — 35. + 350) 
* (r— 1) (Yot 3Yo— This- bfa — 2e + Wos), (3:5) 
E(k) 14 = (r— 1)! vas (r— 1) (679, — 105 — 61749) 
+ (r — 1? (3g, + 25/4 — 184 + 30% + 10474, — 10%) 
+ (r=1) (oq o + TY — 61/49 — 20/1 — 209 — 19/33) + Wor 
Cr = (r— 1) rgo — (r +3) Var + oa — (7-2) Yoo, 
E(k) (r1) = (r— 1) Wo e — 10r + 1) Yag + (r— 2) (7? — 12r + 25) Yas 
+ (r—2) (r—3) (2r—10) V + (r— 4) (r—2) (7-3) Yao (3-6) 
— (213 — 9r? + 20r — 25) N- (r — 2) (2r* — 4r + 10) Yao 
+ (22 — 4r + 6) Yon + (0 r? + 3) Yoo 


The following formulae which are of interest may also be noted: 


Iti) = Vo 
Viel-) ir Vos— Vip 3.7) 
lolrr) = Vos 3 Vo + 20 
Aal) = Mon- 40s Vor t Oho 751 — 3 
The product-moment of any k-statistie with x, may be written in the form 
& (az) = Ves lh, (3:8) 


where V*is the operator such that Ve = a0 (3:9) 


so that, for example 
& (kou) r = (r—1) Va (rt 3) Vas Vos (r— 2) Vor 


THOD OF EVALUATING THE INTEGRAL Vp, m: a,b) 


It has been demonstrated in $3 how evaluation of moments of k-statistics sad Bere 
to the evaluation of a set of integrals of the same form, d Pto vi 5 
isolated cases the integrals are known; for example, we may deduce y Ty 


Wd, 2 ＋ l: 0, 25 1) 20. (m7 0, l.. 


4. AME 


.,00; b = 0, 1, ...,00), 


that is all odd moments of the median . 
Methods of quadrature may be 5 a e 
the form of the integral and range of integrasion 


integral, although it is to be expected from 
that a large number of ordinates would be 
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necessary to produce reasonable accuracy. Be that as it may, a more serious disadvan 
of applying quadrature is that each of r, », a and b must have their numerical values 
specified explicitly. 

It will be shown that there is a method of closely approximating to the true value of the 
integral when only a, b and the ratio r/(n +1) are numerically specified, by equating it to 
a truncated power series in (n 4- 2) -l. This approximation to the integral follows a method 
used by David & Johnson (1954). 

We see that * 
Van: a,b) = Er (L0) ol (41) 


z-function of F 


where p(F)= Fr(1 — Fy, (4-2) 


1 
B(r,n—r--1) 
Now F is densest around F = r/(n+1) = p, say, and we define X, by F(X,) = p, and put 

oF = F-6(F) = F—p,. (43) 


Using an inverse Taylor expansion we may, by expanding about X,, write 


m onem (44) 
Z(x)IF(x) = do d,ðF 4- d, F* +... 4d,5F'+..., 

AED | ate ad 45 
whence Fall a = Zee da), ôF! (£5) 
ende) being the coefficient of F. in b ^ ap" ( Xd, ar")! so that by (2-1) and (2-4) 

0 0 
V n a,b) S (erde) M, (46) 
t=0 


when this converges and where M, = (Ol) is the tth central moment of the B. distribution 
of equation (4-2). 

Tf k, n, a, b, are each numerically specified, we have here an alternative method to that of 
quadrature for evaluating 5 Pr n: a,b), for by taking the first (say) eleven terms of the sum 


(4-6), since M, is of order n+), then 5 (c^ d^), M, is of order ni; thus we may approximate 
icu 


as closely as desired to ( Pr n: a,b) by taking a sufficient number of terms of the sum, at 
least for n sufficiently large. 


However, if only p,, a, b are specified numerically, we see that since c; and d; are simple 
functions of p,, X, and Z, — Z(X,) (which are given below, Table 1) and since 


M. = Y my(n4-2)7, ex 
i=0 
where m; is a function of P, only (given in Table 2), we have 


l œ 
Wonn: a,b) = X X (de) m,(n--2)-4-0(n-42)-131, (4:8) 


Li 
- à Ho», a,b) (n -- 2)-5--O(n 2), (4:9) 
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where 


Ha pn a,b) = Seu 


u 
= à (ed (4:10) 
since m; = 0 for t» 2i. Now the (cj) are seen to be just polynomials in X, multiplied by 
1/(Z j!); thus if we put 
6 2401 EXthy (411) 
we may table the integers h,,. 


Table 1. Values of hy; for O0 € 10 


6 0 127 0 326 0 120 
e| o 0 | 12280 M UM — x 5,040 

8| o| 4,369 0| 22,404 Y 3 

9 | 4,369 0 | 102,164 0 | 290,292 0 | 212,976 0 | 40,320 "E 
10 0 | 243,649 0 | 2,080,644 0 | 3,890,484 0 | 2,239,344 0 : 


We note the relation A; = (j— 1) Aia jai ee 


the proof is easily effected by induction. The d's are more readily expressible in terms of 


the c's, thus: 
d, EE 2 1 


d, A, pr * co 
21d, = 2Z, pg 200 f * — pr "s 
31d, = — 6Z, p, *— 6c pr f genf cpr, ete. 
and a useful recurrence relation for the d's, which may be proved by induction, is 
1p 4 
dj = — 75 i} 
and n are specified numerically so that we are 


must be known, as opposed to the case where 
and so use the summation (4-8) involving 


In passing, it is worth noting that if r 
interested in the summation (4:6) where M. 
we know numerically only the ratio r[(n 4-1) 
the my, then the recurrence relation 


n—2r41 n-r+l y 
n+t+1) Mua =t M,+tr n+ I) ta 
( n+l 
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provides the easiest way of calculating the A. The following table gives values 
temsof -l. Gr U, v, = pg rr Dn 1). 
In addition we note that mot = 0, i 0; moo = l; 

mi = 0 for all i; 

mi = O, i»1; Ms, = 9,; 

my, = 0, t>2i. 


Table 2. Values of my in terms of p, 


5 
m 

2u,v, 2u,v, —2u,v, 

3c G u. — 6v? 12 — 18utv, 42 v. 24 f 
Zero v? 24u2v, — 92u, v? 332u,v? — 144 b, 
Zero 15v? 130% b — 90v? 120 %u. — 1070u2v2 + 4: 
Zero Zero 210 e 924u$v? --2142u,v? 
Zero Zero 105v 2380u2v? — 1260 f 
Zero Zero Zero 2520u,vt 
Zero Zero Zero 9455 


5. NUMERICAL VALUES OF THE COEFFICIENTS H. (P,, a,b) 
Numerical values of the coefficients of (n+2)~, that is the H,(p,,a, b) occurring in 
(4-9), which arise in the expansion of /, n: a, b), have been obtained using the a 
described, for values of a+b<4, p = 0-50 (0-05) 0-80 and i = 0 (1) 5; these are 


Table 4 printed on pp. 218-221. It is thought that these figures are at most two t 
the last place of decimals given. 


6. ACCURACY OF THE TABLES 


In an article edited by Teichroew (1956), the values of 6(z;: V) and CH: V) hw 
evaluated by quadrature to ten places of decimals.* Since 


(ar,) ba Vor é (az) = Vow ê (T) — — Vo é (F. 1) = — Vu 


r-l 
where z, = Zxr- 1), we may compare values of Yo, Yoe, Yio, 1 obtained 
(i) using Teichroew’s Tables, 
(ii) using the first six terms on (4-9), i.e. with l = 5. 
The results are shown in Table 3 for n = 19, r = 10, 11, ..., 16. The error, i.e. differe 
(i) -(), for these particular values of a and b is seen to be, at 


can worst, five in the seventh 

ficant figure, although the results often agree to eight significant figures. 

* In his article, Teichroew uses æ; as the ith largest observation in a sample of N from a unit 
population. 
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Table 3. Comparison of approximate values of ý p with tree values for n -19 


Yn Vo 
— l — 
* By (ii) Error | r By (i) Error 
i E 
— — -f 

10 Zero Zero | 10 + CONT POPI * 4 

T 0.1307 2488 +0 | n + 09837659 +4 

12 + 2637 4289 +0 | 12 + 45239425 *3 

13 + -40164227 +0 || B + 20003370 +2 

14 + -54770736 +2 n + 39000519 +4 

15 + 70661142 +6 | | a5 + 59009419 +7 

16 + 88586168 +28 16 + 9222497 +33 

Vio Vu 
— — — — 

E | By (ii) Error | By (ii) Error 
| | 
| wa Í <S 

10 |  +080668488 +0 — 005106715 +2 

1 + -72601640 +0 + 04590047 -2 

12 + 6481 3083 +0 + 1397400 | -3 

13 + 57214138 +0 + 18406009 -3 

14 + 49723492 +0 + -22783843 | +1 

18 + -42259618 +0 + 25513535 | -3 

16 + -34731563 +4 + 2680 8331 


N. B. The ‘error’ is: (result from (i) — result from (ii)) x 10*. 


7. SUMMARY AND COMMENT 


In $3, we derived explicit relations between moments of sample momenta of a censored 
normal sample and certain integrals heck c va RR 

§ 4 divides itself into two parts co ing to two cases H r 

Case (i). Arises when we are given r and n numerically and we wish to determine the 
moments of sample moments. In this event we approximate to the integral by taking a 
sufficient number of terms of an infinite sum (equation (4-6)), noting that if we take the first 
21+1 terms of the sum the error is of order (n--2)-*. It is then only necessary to 
substitute the numerical values of Yap into the aaa equations (3:5)-(3:8) to obtain 
numerical values of the moments for the given r and n. — 

Case (ii). It would clearly be laborious to prepare tables of Ya, for all - aml 
rand n, even though we may restrict r and n to be less than 20, and "" Tom sicui 
severely limit the application of the tables. It has therefore been 1 oa Ee «n p 
of r/(n.4- 1), a, and b are given numerically and an expansion for Var : ES . 
à power series in (n 4. 2 l, the coefficients of which are functions of Be a, t y Sf 55 
(49). We are then in a position to evaluate Yan for any values of r n sa g 


r|(n+1) = p,. Moreover, by using this treatment we are better able to study the moments 


i i i becomes 
of the sample moments and related problems suggested in the introduction, as n 


large. 
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In §5, as a consequence of the remarks made above in case (ii), values of H,(p,,a, 
been obtained for p = 0-50(0-05)0-80 and a+b < 4. These were given in T, 
provide a means of obtaining the value of V for a large range of values of r and n. 

It will often be required to interpolate when p, = a/(n + 1) is not one of the value 
In these cases, it will be found more convenient to work out the numerical y 
V(p,, n: a,b) for the values of p, to either side of p, and then to interpolate in these 
rather than to interpolate in Table 4 for H. p,, a, b) and then to obtain i, n: a,b) 

In $4, certain values of ¥/(p,, 19: a,b) are compared with some known results fc 
b = 1, 2124 = 1;b = 0,1. It will be seen that for n as small as 19, good agreement is ob 
This is because for these values of a and b, H. vr, a, ö) remains of order one. When 
takes higher values of 3 or 4, where some H. v, a, b) are of order 100, it is tho gh 
n = 50 will give similar accuracy. 


I wish to express my gratitude to Dr F. N. David and Dr D. E. Barton for their sugge 


and constructive criticisms which have proved invaluable in preparing this paper. 
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Table 4. Values of H; (p, a, b) 


a=1;b=0 
i=0 1 2 3 4 
+0-79788 45608 | + 0-17122 7492 | + 0-26759482 | + 0-3518053 | + 0-350832 | + 
+ -7196452336 | + -12315 2558 | + 20870 247 + 29503666 + -311988 | + 
+ 64390 42225 | + 08049 2217 + -16088721 | + -25335 77 + 277806 + 
+0:56984 46222 | + 0-04185 6190 | + 0-12255155 | + 0.22605 22 + 0245949 | + 
+ 4907037340 | + 00620 0557 | + 09337 578 21432 10 + 208964 — 
+ -42370 20969 | — 02729 4186 + -07461 594 | + 22205 48 + 145752] — 
+ -34995 24005 | — -059298026 | + .07032069 | + 2579816 — 009978 — 
a=0; b=1 
Pr 120 1 2 3 4 
0-50 Zero Zero Zero Z 
ero Zero 
“85 + 0-12566 13469) + 0-09926 2368 | + 0-14089 642 + 0.15522 89 + 0-097953 | — 
60 | + -2533471031 | + 20368 1761 + 29158 668 | + 31973 56 +  -186023| — 
e +0-38532 04003 | + 0-31947 2780 | + 0.46443 172 + 050459 63 + 0-244160 | — 
X + 52440 05127 | + 45547 1782 | + .67845 307 + 7250833 | + 220097 — 
a + 16744897502 | + 62618 5313 | + 96821 269 + 1.00430 49 — -042989 | — 
+ 84162 12333 + -85903 0815 | + 1.40772 958 + 13773535 | — 1-098516 | — 


l 
Ip, n: a,b) na. b) ( ＋ 2 On 2) 
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Table 4. (cont. 
COATT] 
Pe t=O 1 2 5 * — 
moda. - 1 IN — 
OSO «06300197724 | 0900859817 eee 
55 + 51788 92622 | + 761878009 | + L0694 096 + (13990600 + 144394. + 1-0973 
0 + 41461 26478 + — 04036 5184 | + 085548 320 + LOSS 16 + 1309290. + 144291 
0-65 4032472 28935 | + (538903236 + #67052198 + 04209100 + 1052264 + 120498 
10% + -24671 46000 | + -453011344 | + -52330977 + — 41808 80 + ONIS + D-09008 
7 + 0795234000 + 378879236 | + 5384002€7  —— 439807 + — 1183159 |. 092411 | 
‘80 0 02246 60826 | + -313458958 | + 3590 496 | + 2932620 + 4150012 + 55508 
x — 
asl; bml 
Pe i=0 | 1 | 2 * 
M — i 
650| Zero | = rooooooooo | — I42920387 — lee en — lee — 186303 
55 | +0-09043 15893 | — 0-874139092 | — 118344855 — 14228077 = 16234 — 1-058585 
| -60 | + -16313 12694 | — -777425743 | — 0-98742 900 141786 0 
| 0-05 | +0-21957 27055 | — 0-704383338 | — 0-82519 724 
‘70 | + -26047 16931 | — -651554388 | —  -08346 406 
75 | 42857827215 | — -617059012 | — -54803005 
80 + -29452 73710 | = 
a=0; bz2 
[i I 
P, i20 1 | 2 3 | s | : | 
p i . 
0-50 Zero + 157079 6327 + + | 
‘55 | +0-01579 07741 | + 1-60478 6203 + * 
60 | + 06418 47546 | + 1-71113 0875 + + 
0-65 | +0-14847 18617 | + 190441 7301 | + $0597 909 | + 4402039 | + Smari | + satoi 
‘70 | + 27499 58977 | + 221481 3445 | + 391515 342 | + 7.9200148 | + 7381723 | — 2.29282 
"75 | + -45493 64231 | + 270147 8618 | + 5-125083 162 | + 1604044 | + 7-563796 | —19-32806 
80 | + -70832 63007 | + 3.48732 8085 | + 728989853 | + 1 


i-0 


+0-50794 90875 
+ 37269 65391 
+ 26697 08346 


+0°18504 15946 
+ :12254 40632 
+ :07606 44694 
+ :04285 75096 


3-66478 98 
+ 1445350 868 2.81336 74 
+ 1.13688 226 2.16227 70 


1-65535 15 


67044 798 | + 1-26708 0l 
$ 49629 502 | + 09378512 
$ 35087394 | + 06899212 


n: a,b) -Enousn (n+ 2944 O(n+ eee 
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P | ind 

| 

0-50 Zero 

“55 | 4006507 86622 

*60 | + -10504 09132 
0-05 | 4 0:12512 23707 

70 + -12037 72027 
| +75 | + -12108 67383 
| 80 | 10307 05605 
l 


+0-01136 37553 
+ 04132 88345 


+0-08460 58919 
+ -13659 14894 
+ 19278 75164 
+ 24788 04893 


+ -01626 10216 


+ 0-05720 92470 
+ 14420 79897 
+ :30084 99544 
+ 59614 24549 


i=0 


+0-40528 47346 
+ 26820 92879 
+ 17190 36477 


＋ 0.10544 49576 
+ 06086 80938 
+ 03222 86752 
+ 01499 80883 


Table 4. (cont.) 
022; bal 


Moments of sample moments from a normal population 


1 2 3 4 
— 59576912 | — 33080440 | — 5973052 | — 911597 
— 123608 335 — 241107 26 — 46501629 — 686359 
— 094965 140 — 190464 95 — 39419357 | — 519733 
— 071722 503 — 143603 41 — 261204 0 — 3.92387 
— 62530129 | — 1.07090 88 — 2014053 | — 289284 
— 36416 278 — 079022 53 — 1559797 — 2000796 
— 22677 045 — 88259 47 — 1234401 — 102001 
0-21; bz2 
2 3 4 
+ 0125331414 + 2-77559 18 + 5363198 + 875962 
+ 0-91528 794 | + 1-909816 59 | + 3-944086 | + 6-66400 
63626 717 + 139118 67 + 2918210 + 523275 
0-39588 585 + 0-90292 68 + 219251 9 + 432775 
17777 506 + 50048 90 + 1735359 + 390170 
03326 657 + 16593 29 + 1587061 + 3-97664 
25444 627 | — 0993188 | + 1-928250 | + 458381 
a=0; b=3 
= 
2 3 4 
Zero Zero Zero Zero 
+ 0-60027 650 | + 1-5637033 | + 3-216610 | + 521046 
+ 1-26131 036 + 3.33024 39 + 692451 6 + 11:21360 
+ 205913 484 + 556844 85 + 11-80109 6 + 19:07752 
+ 3-10858 930 + 872357 20 + 19-03930 5 + 30-60294 
+ 461173 540 + 13-67166 50 + 31163629 + 49-24546 
+ 6-97960 725 + 22-44708 89 + 5456974 8 + 82-85634 
a=4; b—-0 


2 


2-77960 780 
2-00022 412 
1:42110 843 


0-98812 112 
66450 834 
42471 523 
25066 060 


TEE +++ 


8:35771 08 
6:11033 59 
4:47100 29 


3:26979 78 
2-36773 64 
1-68335 56 
116945 84 


++++ +++ 
++++ +++ 


18-68579 9 
13-48275 0 
977832 5 


707742 6 
5-08105 7 
355091 1 
2-38785 3 


1 
Vn: a,b) = È neas) (n4-2)-5-- O(n 4- 2)-1 


34-48894 
24-65797 
17-75862 


+ +++ 


12-79818 
+ 9.16367 
+ 6-48876 
+ 4.51560 


065 
90 
15 


80 


80 


Zero 
+ 044453 354901 | — 137350 408 
* 06763 62876 | - 65130 492 


+00713003135 | — 04MM 757 
+ 06426 21606 | — — 24409 504 
+ 06130 47049 — 


Zero + 589800 73 + 
+0-00817 78724 + *13585 4 * 
+ 02661 18111 + T03188 94 * 
* 004821 22125 + 208003 73 + 
+ -06784 55029 + 190603 71 + 
+ -08167 17639 + 120871 67 E 
+ 08674 63723 + OSST 61 + 


14137167 


— 31-5595 
+0-00142 79848 
+ -01047 05405 m | 
I 
+ 0-03260 03817 ae | 
| + -07162 86470 à 
+ :13001 29691 * sams | 


+ :20862 14831 


Zero Zero + 744022033 
| +0-00024 93485 | + 015046918 | + 8-038419 15 oo 
+ -00411 96827 63247 40 +10-08419 24 
4-0-02204 38937 | + 1-55030 000 +14-09144 07 
+ -07562 27437 | + 3-12892 631 1 él 
+ 20696 71491 | + 583684 664 + 34-425! 2 
+ 50172 61482 | +10-72416 185 | +60-44444 


M pan: a,b) Ege (N 4. O(n-- 2)? 
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THE RELATION BETWEEN THE DICTIONARY DISTRIBUTION 
AND THE OCCURRENCE DISTRIBUTION OF WORD LENGTH 
AND ITS IMPORTANCE FOR THE STUDY OF 
QUANTITATIVE LINGUISTICS 


By G. HERDAN, M. So., Pn. D., LL.D. 
Lecturer in Statistics, University of Bristol 


One of the many unsolved problems of quantitative linguistics is the relation between the 
vocabulary and the occurrence distribution of specified linguistic forms, i.e. between the 
frequency distribution in the dictionary and that of occurrence in the spoken or written 
language. 

For different linguistic forms, such as phonemes, phoneme combinations (morphemes), 
word length (in terms of syllable, phoneme, letter number), the answer may conceivably be 
different. The present investigation deals with the characteristic of word length in terms of 
phoneme* and letter number per word, and arrives at the conclusion that the occurrence 
distribution can be regarded as a moment distribution of the vocabulary distribution. 
This will be shown to be a consequence of the log normality of word length distributions. 

To log normality of certain linguistic distributions as an empirical fact Williams has 
drawn atention (1940, 1946). In this paper the hypothesis of log normality is extended to 
a set of related linguistic variables, viz. the distributions of both word occurrence and 
vocabulary according to word length in terms of the number of letters as well as phonemes. 
The comparison of these distributions as lognormal variates reveals the hypothesis of 


log normality as being of great value for the study of quantitative linguistics, practical and 
theoretical, 


I 


The material for the investigations was provided by a count of approximately 80,000 word 
occurrences obtained from telephone conversations by French, Carter and Koenig of the 
Bell Telephone Co. (1930). The distributions according to word length of the 76,054 word 
occurrences and of the 738 vocabulary items, that is different words, contained in the sample 
in terms of both letters and phonemes are shown in the following table (from Herdan, 1956), 
where p, is the percentage of words containing i units and x, the average frequency of such 
words in the spoken language. 

As shown below, if the occurrence distribution is a moment distribution of the vocabulary 
distribution, their standard deviations must be sensibly equal, which will result in the 


parallelism of the lines representing these distributions on a logarithmic probability grid. 
For the phoneme distrib 


0-050, the lines being slightly convergent. 


The plot of these distributions on a log probability grid whose abscissa has a logarithmie 
scale and whose ordinate is the Gaussian integral, is in each case a sensibly straight line, 


* The phoneme is the smallest linguistic unit with distinctive function in the spoken langu g 
corresponds in magnitude to letter in written language. Tt is well known that in English the co 


a between letters and phonemes is not always very close: thus the written form of the WO 
thought’ requires seven letters, but the spoken form is a sequence of only three phonemes: ‘Oot’. 


ution this is the case, for the letter distribution they differ by - 
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which suggests log normality of the variates (Fig. 1). If the hypothesis of log normality is 
correct, they should comply with the characteristic features of such distributions (Aitchison 
& Brown, 1957). As the most characteristic feature—and peculiar to lognormal variates*— 
we take the moment distribution property, according to which the jth moment distribution 
of a lognormal distribution with parameters y and i is also a lognormal distribution with 
parameters y +jo* and o, respectively. Writing for the logarithmic distribution function 
A(x |, 0%), where y is the logarithmic mean and g the logarithmic standard deviation, we 
define the jth moment distribution function as 


Ajelp ot) = x; [vanus 


where A; = e lies is the jth moment about zero. 


Table 1 


No. of units, i, 
per word 


(0-30 Qu 5 Pto 


Mean of log, i 
S. D. of logot 


— 


Proof of property (Aitchison & Brown, 1957). 
Aele) = gr fatuus o 
* 1 1 m 
= eire eine Te -za doe H) du 
1 EG ere. 
x |e ee — 2 (logu-p—j 
P f uc (n) e| 208 


Ar, 0°). 
* Though not exclusively so. 
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It follows that the graphs of different moment distributions should be parallel lines o 
log-probability paper, with a distance of jo? between them. 

Turning now to the distributions of Table 1 and Fig. 1, we find the logarithmie pro- 
bability graphs for vocabulary and occurrences for both letters and phonemes, to be sen- 
sibly parallel lines. If what is plotted are different moment distributions, the distance 
between the vocabulary and occurrence lines must be that required by the moment theorem, 


Accumulated total as percentage of whole sample 


Number of letters (phonemes) per word 


Fig. 1. O, word occurrence against phoneme number; @, vocabulary against phoneme number; 
+, word occurrence against letter number; x, vocabulary against letter number. 
k 
; d zb 
What is plotted against log word length for vocabulary is X p; and for occurrence Y Pian 
i=1 = 


where p, is the number of vocabulary items (different words) of length 7, and x; the average 
number of occurrences of a word of length i. In an efficient code, such as language may 
be regarded to represent, word length appears to be inversely related to frequency of 
oceurrence by a function of the form 


S(t) = (aji) hi, 
where a, b and k are constants, and where b is so close to unity that as a first approximation 
the formula may be written as fü) * 
= ai- 


(Good, 1951; Simon, 1955; Mandelbrot, 1954). In our notation * = at, If stands e 
length of English words in terms of letters or phonemes, i: appears to be between 2 and 3, 
and we assume it to be 2-4, We write therefore 


xoc 124. 


k 
The log probability graphs of X p;for vocabulary and $ Piti (< X Pi Ek) for occurrences 
i=1 i=1 i=l 
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plotted against log i thus represent respectively the basic frequency distribution and the 
- 24th moment distribution for vocabulary items according to log word length. If the 
variates are log normal these graphs may, therefore, be expected to show the characteristic 
relation between distributions of different moments explained above. This is actually the 
case. Moreover, if the distribution of occurrence represents the — 2-4th moment of the 
vocabulary distribution, the averages of the two distributions, as the Oth and —2-4th 
moment means, should be 2-49? apart, that is the occurrence mean should equal y — 2-407; 
as will be seen, there is good agreement between observation and theory in this respect, 
the differences between the respective means being sensibly equal to 2-4¢%. 

Table 2 gives in column 3 the observed difference between the vocabulary and occurrence 
means, and in column 6 the theoretical value, 


Table 2 


24 2:30 20 % 


(base 10) 
Phonemes 
Vocabulary 0-608 
Occurrences 0-414 
Letters 
Vocabulary 0-703 
Occurrences 0-494 


The theoretical difference between the moment averages as given in the text is in terms of 
natural logarithms. If logarithms to the base 10 are used, the following transformation 
must be observed (Herdan, 1953). Since the mean and standard deviation of the log 
distribution are the logarithms of the geometric mean & and the geometric standard devia- 
tion, c, respectively, viz. 


y=InG, = In oa, 
the relation y-rYrje 
can be written as In G, = In G -jIn* og. 
Transforming into logarithms to the base 10, we have 
logy G; = logio G + 2:3026j login Te, 
or y; = y+ 2.80200. 
The last column of Table 2 shows 2:3026j0? with j = —2-4 and with the mean values, 8, 


from the preceding column substituted for v. These figures are in good agreement with the 


observed differences given in the third column. : ad 
We 2d cheese take it as a reasonable hypothesis supported by some statistical 


evidence that: 4 / ý 
(a) the distribution of vocabulary and that of occurrence against word length in terms o: 


number of letters or phonemes satisfy the criteria for log normality; 1 
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(b) the distribution of occurrences may be regarded as the — 2-4th moment distribution 
of vocabulary according to word length. 

The practical importance of this lies in the fact that knowing either the vocabulary or 
the occurrence distribution, it is possible to obtain immediately an estimate of the other 
distribution. Let us assume we had made a vocabulary count and wanted an estimate of 
the distribution of occurrences of that vocabulary in a continuous text of a certain length, 
Knowing the logarithmic mean y and the logarithmic standard deviation c of the vocabulary 
count,we calculate the average number of occurrences as y — 2· 402, and knowing that the 
distributions are lognormal with the same standard deviation, we can immediately con- 
struct the straight line on log probability paper which represents the cumulative distribution 
of occurrences. Working backwards from this, we obtain the distribution function. This 
may be of interest in connexion with mechanical translation. Given a vocabulary witha 
certain distribution of words according to their length, we can quickly obtain an estimate 
of the distribution of occurrences in a text of a given size. 

Considering that the method rests upon the assumption of an inverse relation between 
frequency and word length, which may be true to a different degree for different material, 
it follows that the multiple of 2-4 will give a satisfactory solution only to the extent to 
which the assumption is in agreement with the facts. The less true the assumption of strict 
reciprocity, the more will the multiple differ from 2-4 and it will, therefore, in general, be 
necessary to ascertain by a pilot investigation or from the literature how far the reciprocity 
assumption is true, in order to choose the correct multiple of o®. For determining the 
constant, Mandelbrot’s (1953) or Simon’s (1955) formula will be found useful. 

Conversely, from the distance of the log probability graphs for vocabulary and occurrence 
distribution, inferences may be drawn about the extent to which the reciprocity relation 
(Zipf, 1949; Simon, 1955), is true for a given language. 


II 
It is of interest to compare the two distributions with regard to the information they 
contain. Considering that in an efficient coding system, log icc log 1/p, (p, standing for the 
probability of occurrences of symbols of length i) and Xp,log icc Xp, log 1/p,, the first 
moments of our distributions appear to be proportional to the information J, for vocabulary 


and J, for occurrences which, according to the theory of information (Brillouin, 1956), are 
calculated as 


IA Apis log pi, 
and 1, = — KEpylog Pois [ 


where Poi is the probability of word length i in the dictionary and po: in the spoken language. 
For a binary system, for which K = 3-322, these quantities become in our case: 
Letters: I, = 2-988 bits / word length, M = 5:417; 
J, = 2-628 bits / word length, M = 3-508. 
Phonemes: J. = 2.726 bits/word length, M = 4-413; 
J. = 2-274 bits / word length, M = 2-856. 
This enables us to compare the information property of letters against phonemes, and for 


each of vocabulary against occurrences. What we are doing, in using the entropy 1 
characteristic instead of the average M, is to make use of the fundamental rule which play? 
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a most important part in statistical thermodynamics, viz. that the average and the most 
probable distribution coincide for large numbers. The relation between the two is for 
binary coding expressed by Shannon's fundamental theorem / « M. 

When trying to assess which code is more efficient, we must not be misled by tho impression 
that a greater J means more information, and thus a more efficient code. The entropy / 
which originally was considered by Shannon as a measure of information is now recognized 


(by Brillouin, for instance) as being a measure of unexpectedness or lack of information, 
which has compelled Brillouin to introduce the concept of negentropy for measuring 
information. The apparent confusion of thought is resolved by the following consideration: 


A small entropy (and a great redundancy) implies a small effort in terms of number of 
guesses for restoring missing information; only a great store of information about the 
structure of the code will enable the receiver of the message to make correct guesses with 
little effort, and a small J (great R) thus implies much past or advance information. On the 
other hand, if we think in terms of what is gained by such guesses, that is, future information, 
then it is clear that symbols with great probability of occurrence which are easily guessed 
correctly will add little to our knowledge, and thus represent little information, whereas 
rarely occurring symbols which are only guessed with difficulty mean much information 
(and little redundancy). 
For our data we find 


(a) I. Phon.) = 2-726 < I,(lett.) = 2-988, 
I,(phon.) 2-274 < I,(lett.) = 2-628, 


and thus J(phon.) < /(lett.) 
Further, we find 


(b) J Phon.) = 2-274 < I,(phon.) = 2-726, 
I(lett. = 2-628</J,(lett.) = 2-988 


and thus J, < I. 

This means 

(a) that the phonemic channel is more efficient than the letter channel; and : 

(b) that the knowledge of occurrences makes it easier to guess at 8 eras 
and that it therefore represents some additional information over and above that provi 
by the mere vocabulary code. ; dae 

The basis of linguistic information theory is the hypothetical DET 8 
various items of a specified linguistic characteristic, word length in our case. - uch so! x 
exists between the items in the dictionary and between their siad 7 E on 
the spoken and written language, and makes it possible to guess—wit pue Sia m 
of being correct—at the word length of individual € pun 2 Beare aa result 
dictionary or speech. The number of such guesses for arriving a decr I, 1 that the 
is epitomized by the value of the entropy. The fact n 15 cid This is only what one 
uncertainty of word length in speech is less than in the comi nary is supplemented 
might expect, since in speech the solidarity of word length ion xcd essing word 
by that of occurrence, and we have therefore more information availa JE - 


length correctly. 
Returning now to the conclusion reached pka 
introduction of the lognormal hypothesis for the ma 


in ȘI, it seems rather interesting that the 
atical structure of our distribu- 


15-2 
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tions should enable us to add, and subtract, information when only one distribution, 
vocabulary or occurrences, is given. For this is what the estimation of the other distribution 
means in terms of information theory. Being given the vocabulary distribution and deriving 
that of occurrences by the method established above, we reduce J, to J, which means less 
deciphering work because of more advance information, and vice versa if the vocabulary 
distribution is derived from the occurrence distribution. 


SUMMARY 


l. From a representative sample of 76,054 words of spoken English the frequency dis- 
tributions of vocabulary and occurrences according to word length in terms of letters and 
phonemes were obtained, and subjected to statistical and information-theoretical analysis, 

2. The distributions were found to be sensibly lognormal. 

3. The relation between the occurrence and vocabulary distributions according to 
either, letter or phoneme number is such that they could be interpreted as different moment 
distributions of a lognormal variate. This admits the derivation of one distribution by the 
moment distribution theorem for lognormal variates if the other is known. The practical 
importance of this lies in the possibility of a quick estimate of the occurrence distribution 
without having to carry out a complete word count or of the vocabulary distribution 
without a dictionary count. 

4. A transformation of the logarithmic variable of the distributions by using an esta- 
blished relation between frequency and length of linguistic symbols shows the occurrence 
and vocabulary means to be of the form of the theoretical measure of information. The 
comparison of the statistical results with their information-theoretical interpretation 
enables us to assess the value, in terms of information, of using the moment distribution 
theorem of log normality. 
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SECOND PAPER ON STATISTICS ASSOCIATED WITH THE 
RANDOM DISORIENTATION OF CUBES 


Bv J. K. MACKENZIE 
Division of Tribophysics, Commonwealth Scientific and Industrial Research 
Organization, University of Melbourne, Australia 


"Theoretical density functions are obtained for the angle of disorientation (the least angle of rotation 
required to rotate a cube into a standard orientation) and for Min g 100% (the least of the nine acute 
angles between the edges of a cube and the edges of a fixed reference cube). These density functions and 
their cumulative distribution functions have been evaluated numerically. 


l. INTRODUCTION 


In a recent paper Mackenzie & Thomson (1957) described a class of problems in three- 
dimensional geometrical probability, and some of the associated density functions were 
estimated numerically by means of random sampling. In this paper two of these density 
functions are obtained in analytical form and, together with their cumulative distribution 
functions, evaluated numerically. 

The two density functions obtained are those for the angle of disorientation and Min (1007. 
These two variables can be defined as follows. Consider two cubes, A and B, and imagine 
A to be a reference cube with its edges parallel to a fixed set of co-ordinate axes and its 
centre at the origin, while B is initially coincident with A but free to rotate in any manner 
about the common centre of A and B. If Bis given an arbitrary rotation there are 24 definite 
rotations which will restore B into coincidence with 4; these are just the reverse of the 
original rotation taken together with the 24 proper symmetry operations associated with 
a cube having indistinguishable faces (see $2). The angle of disorientation is the ua 
(in magnitude) of the 24 angles of rotation so obtained, while Min (100) is the least is = 
nine acute angles between the edges of the cube B in an arbitrary orientation and the 
edges of A. : i 

ae success of the present calculations has depended essentially on 5 
of detailed calculation required although this is still quite considerable. Since t Sun 
can be made for a whole class of problems, including the two special Sut nime m zs 
a formulation of this class is given in $2 together with their formal doa. 5 pie D. 
is of no practical use. Section 3 is devoted to reductions common B the T AN atum 
$4 completes the reduction for the two special cases. The fact that t a o 
arguments which are basically of a gronp-theoretical nature Wwe t * eS ey, 
use of group theory might make practicable a solution for the w. roe = "s Ec tod 

The density functions for the angle of disorientation -— i s : f ibm 
analytically and numerically in §§5 and 6. A large amount of a D RT a 
omitted in these sections and only a few important intermediate results are given. 


i d lf, a copy of which was sent via 
T Following the preparation of the paper . that he had found the exact 


Mr Hammersley to Mr D. C. Handacomb, the latter wrote means, and gave the formulae (5), (5-5) 
distributions of the angle of disorientation by geom ich I have not seen at the time of writing) 
and (5-6) of my paper below. His method (the particulars of which T have 


i ivation in full. 
is, I understand, quite different from mine and I have therefore given my own derivation in 


5 ian Journal of Mathematics. 
His paper, I am informed, is to appear in the Canadian J 8 
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2. FORMULATION OF PROBLEMS 


Since the group of symmetry operations on a cube with indistinguishable faces (the cubie 
group) plays a fundamental role in both the formulation and the reduction of the class of 
problems, a brief statement of these symmetry operations will first be given. 

If the cube A has indistinguishable faces it is invariant under the 48 symmetry operations 
of the cubic group consisting of 24 proper rotations and 24 improper rotations, which are 
proper rotations together with an inversion or a reflexion. The 24 proper rotations are 
(a) the identity element or no rotation, (b) rotations of 180? about the three axes of reference, 
(c) rotations of + 90° about the same axes, (d) rotations of 180° about axes parallel to the 
six face diagonals of the cube, and (e) rotations of + 120? about axes parallel to the four 
diagonals of the cube. On taking axes of reference parallel to the edges of the cube these 24 
rotations can be represented by 3 x 3 orthogonal matrices. These matrices have only three 
non-zero elements which are either +1 or — 1 and these are arranged in all possible ways 
such that there is a non-zero element in each row and column and the determinant of the 
matrix is +1. 

In all that follows, the matrices representing these proper symmetry rotations will be 
denoted by S, (i = 1,..., 24); the improper rotations are then —S,. Further, the 3x3 
orthogonal matrix which represents an arbitrary proper rotation through an angle y about 
an axis in the direction n Int will be denoted by R with elements r; given by (3:1) 
and this rotation will be described briefly as either the rotation R, the rotation yr, n or the 
rotation y[n,mgn;]. 

Since Tr (R) = fu T 722 +133 = 1+2¢cos v, (2-1) 
it follows that the angle of disorientation a is given by 


177 2 cos y; = Max Tr (RS,), 
j 
= Max Tr (S, RS), (2:2) 
ij 


on using the facts that for any matrices B,C Tr (BC) = Tr (CB) provided both products 
exist and that the product 8, S; is another symmetry rotation. 

Further, a generalized variable Min (uvw) can be defined as follows. Let u bea 3x1 
column matrix with elements equal to the direction cosines of the direction [wow], so that 
the set (umb of variants of [uvw] are the 24 directions S,u together with the 24 directions 
—S,u. Thus, the cosine of the angle 0;; between a variant, + S,u, and what another variant, 
S, u, becomes after a rotation R, is given by cos 0% =  u'S;RS;u when the usual scalar 
product is written in matrix notation. Then 


cos (Min (uvw)) = Max | u'S;RS;u|, (2:3) 
m 
= Max | Tr (S. RS, uu) (24 


Since the variants (100) of [100] are parallel to the edges of the cube A the definition of 
Min (100) given in the introduction is a special case of (2-3), Equation (2-4) leads to an 
equivalent definition of Min ¢ 100). For, ifu = [100], Sjuu'S, is a matrix with only one non- 
zero element which is + 1 and may be in any position in the matrix; the trace of the product 
with R then gives + the corresponding element in R'. Thus, the cosine of the angle Min (1 00) 
is the largest of the moduli of the elements of the orthogonal 3 x 3 matrix R. 
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If in (2-4) the S, S, are allowed to range over the full cubic group, the modulus sign can 
be dropped. Then comparison of (2-2) with (2-4) shows that all cases are special cases of 

z(A) = Max Tr(S RS A), (25) 


where A is a given (symmetric) matrix and the S range independently either over the whole 
cubic group or over the subgroup of proper rotations, If V(R) is a given probability measure 
on the space of (proper) orthogonal matrices, the cumulative distribution function of z(A) is 


Pis(A) « X) = [aviR), (26) 


where the region of integration includes all R for which 
z(A) « X. (2:7) 
This formal solution is no more than a statement of what is required and the practical 


problem is first to assign a suitable measure to V(R) and second to determine the region of 
integration. 


3. THE PROBABILITY MEASURE AND REDUCTION OF THE REGION OF INTEGRATION 
When concerned with problems in geometrical probability the appropriate probability . 
measure is determined by a principle of invariance enunciated by Deltheil (1926, p. 13). 
This principle asserts that the result of the calculation must be invariant for any displace- 
ment of the whole figure. In the present case, this means invariant for any rotation of the 
cubes A and B together as a unit. S 3 

Writing c = 1 — eos V and s = sin , the rotation , n has the matrix representation 
Ionic, nine ns, MH MgC b 35 


R = | ninc + ngs, Ionic, nns - ns (3:1) 


NyNgl—Ny8, nange ns, 1 onze 


and in this case Deltheil (1926, p. 105) shows that the element of probability measure is 


given by dV(R) = (1/27) sin? AAS. (3-2) 
where dS is an element of area on the surface of the unit (hemi-) sphere ntn +n =1 
and, if spherical polar co-ordinates 0, G are used to specify the axis of rotation 

dS = sin Hdd. (3-3) 


The whole space of R is covered once if -n< ý <n, og c 2n and 0<0< in, e tatal 


volume of the space is unity. 


The density function defined by (3-2) is the analogue of a uniform density for a one- 


dimensional variable with a finite range. Further, the eee reb 5 um vod 
d its uni ising therefrom ensure that it is identical w1 i 
kenzie & Tn, (pga f their random orthogonal matrices. 


i i nstruction o : 
kenzie & Thomson (1957) in the co ided into 24° = 576 equivalent regions. For, 


The region of integration can now be subdivi 

consider the pair of rotations R and S, RS;. Since products of the type h 55 — 385 
through the complete sequence of symmetry rotations as S; or S; do 80, "i i Hog 
that the value of x(A) is the same for both rotations. But the invariance prope 


À A 4 : d 
probability measure defined by (3° 2)ensure that corresponding regions intheneighbourhoo 
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of the two rotations have equal volumes and so only 1/576 of the total volume need be 
considered. 

The same result can also be reached using the geometrical model mentioned in the intro- 
duction. For suppose that the cube B is subjected to the sequence of rotations S/ RS. 
The above result now follows on using Deltheil’s principle of invariance, provided that the 
final geometrical relationship between the cubes A and B is the same whatever the symmetry 
rotations S,, S, may be. That this is so can be seen as follows. After the symmetry rotation §, 
the cube B is still coincident with the cube A so that after the further rotation R therelation- 
ship between A and B is independent of S,. If the two cubes are now rotated together as 
a rigid body by the rotation S,, the cube B reaches its final orientation and A remains 
invariant; thus, the final relationship between the cubes A and B is independent of S, also. 

Although a preliminary subdivision of the region of integration into 24 equivalent regions 
can be defined in a simple way in the general case, the further subdivision of each of these 
regions into 24 parts is carried out in a manner suited to the two special problems. The 
preliminary subdivision is determined by the fact that if R represents a rotation V, n then 
SRS-! represents a rotation V^, Sn. The end-point of the unit vector n lies on the surface 
of a unit hemisphere and the product Sn simply permutes the components of n in order 
and sign. Thus, the surface of the unit hemisphere can be divided into 24 equivalent spherical 
triangles bounded by great circles for which either the moduli of two components of n are 
equal or one of the components is zero. Tt suffices to consider those axes n which lie in any 
one of these triangles, 

Since rotations about the same axis and through the same angle but in opposite senses 
leaves the geometrical relationship between the cubes A and B unchanged, a further halving 
of the region of integration is achieved. Thus, it may be assumed that the angle of rotation 
V is positive and that the axis of rotation lies in the spherical triangle defined by 


NN n O. 


4. THE REGION OF INTEGRATION 
It is convenient to carry out firs 
integration for the case of the an: 
suitable for Min (100). 


Given any rotation R there is, in general, just one equivalent rotation RS, for which the 
angle of rotation is minimum. Then, there is a unique equivalent matrix R* = S,RS,S;1 
for which the angle of rotation is least and the axis of rotation lies in the triangle defined by 
^27,2n520. But according to the result obtained in the last section it suffices to consider 
only those rotations R for which > 0 and n, >n, > ng > 0. The final reduction of the region 
of integration is now determined by the conditions which ensure that R R“. 

When R is given by (3:1), calculation shows that the cosine of half the angle of rotation 
determined by the matrix RS is given by the modulus of one of the five expressions 


t the final subdivision and specification of the region of 
gle of disorientation and then show that the same region is 


cos hy, nsin 3½, 
(m sin 3 f cos 3/2, [fm nj) sin 3%. (ey 
Yo +g + ng) sin Jy 1 cos iv, 


T If this angle happens to be n. 


: egative the transposed (or i ri resents an equivalent 
rotation through a positive angle. 3n 33 i 
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together with the modulus of what these five expressions become when n,, »,, n, are per- 
muted in all possible ways in order and in sign (24 values in all). Those written down corre- 
spond to the cases where S is the rotation identity, 180^ [100], 90* [100], 180* (110] and 


— 120^ [111]. Clearly, when O« y «rand m 2n, 2n, 0, the largest value of the cosine will 
be one of those set out explicitly in (4-1). 


If R = R*, then cos 3% must be the greatest of the five expressions in (4-1) and after some 
manipulation of inequalities the region of integration is found to be 


oc tan q , I for j2m»m,*m, 
Oxtanjy«l/(n--n,*m, for j2m <n,+n, 
Ny ne 0. 
The same region is also suitable for Min (100) since ru is the greatest element in R“. 
For, within the region (4-2), it is easily shown by using (3:1) that 
71 2 Tg 2 r3 0, 
732 2 Tig > fo > Tig 278127 M 
and that rs, > 0, ry, < 0, while ria may be either positive or negative. Finally ru > rj, provided 
[1 — Gun n3) tan $y] [1 (n — ng- n,) tan 3% > 0, (4-4) 
and in the region (4-2) both factors are positive. 


(4:2) 


(43) 


5. THE ANGLE OF DISORIENTATION 


When attention is restricted to the region of integration defined by (4:2) and the axis of 
rotation is specified in spherical polar co-ordinates the probability element (3-2) becomes 

dV R) = (576/72) sin 0 sin? V dódódy. (5-1) 
Thus, the density function for the angle of disorientation y is found by integrating (5-1) 
with respect to h and ¢ with y fixed and for all 0 and ¢ within the region (4-2), i.e. over part 
of a unit sphere. 

The spherical triangle within which n, > na 252 0 is shown as STU on “= of 1 ae 
graphic projection (Barrett, 1952) in Fig. 1. The region ABS U contains t : P 0 t e 
region of integration determined by the first set of inequalities in (4-2) while : 7 
the part determined by the second set. Now for a fixed / the first inequality of (4-2) can 


be written m € (J2— 1)/tan dy, o) 


and it is clear that for 0 < tan wes 42-1 15 " biben 3 he fe ^ . 
the second inequality is always satisfied for 0<y< 60°. Aap ie: i 
integration is the whole of STU . When y» 45°, equality in ( enda 1 yA 
000 with its centre at [100], so that for 45° < y < 60° the region 8 Deed determined 
the part SP, Q, removed. Likewise when y is just greater en per that vss the region 
by a small cirele centred on [111] is removed from obrem 9 h 2 of nis circle 
F,PiQ,UQ, remains; it is readily verified that the aro Be, "ME Sonde one another 
joining U and B. As y increases further the ares P,Q, mr oe. 1) or y = 60.725. 
until P, and P! coincide with B (and Q, with U) when ue = ion of integration P,Q, Qs 
Finally, the common point P, moves along the arc BA un a ai 

disappears at A when tan 4j = (y2-— 1) (5— 242) LE. 
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Thus, there are four ranges of y to consider and, within each, the density function takes 
a different analytical form. Each range is considered in turn below and the densities have 
all been multiplied by gu so that / is measured in degrees. 
(a) O Stan e- or OSV 45*. The surface area of the triangle STU is ym so 
that the density function is 
70% = (2/15) (1 — cos ). (53) 


(b) /2—1«tanjy «1/43 or 45? «ir «60^. In this case, a contribution from the area 
SH must be subtracted from (5:3). Using polar co-ordinates 0, ¢ with [100] as pole and 
$ = 0 on SU the area SP,Q, is given by 


}r ik de = fall (/I) cot 44), (54) 


m 
[109] 
Fig. 1. Part of a stereographic projection of a hemisphere, showing the region of integration 

for calculating the density function for the angle of disorientation. 


where ꝙ = cos@ and ^ is given by (5-2) with the sign of equality. Thus, 
POH) = (2/15) [3(/2—1) sin y — 2(1 — cos yj]. (59) 


(c) 1//3 S tan I «42(/2— 1) or 60* « y « 60-72^. In addition to the area given by (54) 
the area of the region TP! Q; must also be subtracted from ST'U. This is most simply done 
in terms of polar co-ordinates with [111] as pole and ¢ = 0 on S the range of is to n 


and the limits for cos0 are 1 and (cot 3½/ 3. The final result is 
20% = (2/15) [{8(y2—1) +4/,/3} sin y» — 6(1 — cos ). (6:6) 


() 43/21) <tan jj « (/2— 1) (5—2,/2)! or 60-72" y « 62-80". This case is a little, 
more complicated than the preceding cases as it involves finding the area of the region 
BP,Q,U (where B and U are joined by a small circle centred at [100]) and the area of an 
analogous region on the other side of AB. Using polar co-ordinates as in (b), it is found that 
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for a fixed Ó the extreme values of à at 
n- arcos (cot 0) respectively. Thus jha 9 9 eda * are arcos (cot) and 
v2 
1 [17 — 2 arcos (z/(1 —22)4]] dz, (6-7) 


where x and n, are the same as in (5). Similarly, using i analogo 
area on the other side of AB is Mis to be pe uaa a 


[Hm -areos (d- 210 la- (5:8) 


the range of integration being fro t Finall i 
integrals and combining — . epi ita ie — e 
p(y) = (2/15) 8% — 1) + 4/3) sin y — 6(1 —cos )] 
— (8/57) [2(/2 — I) arcos (X cot 44) -- (1/43) arcos ( Y cot )i/)] sin y 
+ (8/57) [2 arcos ((42 + 1) X/,/2}+areos ((/2-- 1) ¥/,/2}](1—cosy), (5-9) 
where X = (J2— 1)/[1 — (42 — 1): cot? 3H, 
Y = (2 19/[3— cot? 5. | mS 
The density function has been computed from (5:3), (5:5), (5-6) and (5-9) and is tabulated 
together with the cumulative distribution function in Table 1; the latter function was 
obtained by numerical integration of the density function. The mean, the standard deviation 
and the median were calculated to be 
V = 40-736, o = 11-315", Ymea = 42:341". (5-11) 


Table 1. Distribution of the angle of disorientation 


y^ piv) 0. b. r. py) 

0 0-00000 0-00000 0-01015 0-99228 

5 -00051 -00085 . -00856 -99415 
10 -00203 00676 -00695 -99570 
15 -00454 02277 00533 99693 
20 -00804 -05383 -00434 :99752 
25 0-01249 0•10477 0-00283 099850 
30 -01786 -18028 -00151 99935 
35 -02411 28487 -00070 :99978 
40 03119 -42280 -00024 99996 
45 03905 -59810 -00003 1:00000 
50 0-031067 0.77586 0-00000 1-00000 
55 -02201 91097 
60 01015 992²⁸ 


[ o. p. r. cumulative distribution function. 
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The density function has also been plotted in Fig. 2 and has a sharp peak at 45°; in fact the 
first derivative is discontinuous at / = 45 and 60°, while the second derivative is dis- 
continuous at 60-72°. This confirms substantially the guess made by Mackenzie & Thomson 
(1957) concerning the true nature of the distribution. The dots on Fig. 2 give a graphically 
smoothed estimate of the density function obtained from the random sampling calculations, 
The agreement between this estimate and the true density function is rather better than that 
expected since a sample of only 150 was used. 


. 
. 


Angle of disorientation 


Fig. 2. The density function for the angle of disorientation. The ordinate is probability density 
when the angle is measured in degrees and the dots are estimates derived from random sampling. 


6. Mın (100) 
If x is the value of Min (100), then, using (3-1) and the result of § 4, it follows that 


sin Ja = sin 0 sin 4y, (61) 
and the probability element (5-1) becomes 


dV(R*) = (288/n?) sin adadfd¢, (6:2) 
where f = arsin (t cot 0), (6:3) 
t = tan Ja. (6:4) 


The density function for x is found by integrating (6-2) with respect to y and ¢ over the 
region determined by (4:2) and (6-1). The main difficulties are the determination of the 
appropriate limits of integration and the reduetion of the double integrals to single in- 
tegrals. 

The limits of integration are determined in two steps. First, a diagram is constructed 
which shows the limits of the variables 0 and for the case where the ¢ integration is carried 
out first. Most of the results required to do this are available as a result of the calculations 
in the preceding section, the only extension necessary arising from the fact that no use 
can now be made of the simplifications which arose previously by the use of a pole at [111]. 


The second step is to use this diagram together with (6-1) to obtain the limits for 0, and hence 
J, when a is fixed. 


* 
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The diagram is shown in Fig. 3and the limita i 
. for ¢ in the three regions are 
I: OS An, 
II: arcos (cot) <¢ « Im, | 


III: arcos (cot 0) « $ < Jg — arcos [(cot Jj — cos )/,/2 sin 0), | (6-5) 


or arcos [t-! sin 7] < à < |; — arcos [t-t sin (A-. 


ts of the variables h and / when the g integration is carried 


Fig.3. Diagrams showing the limi: 
curves with a a constant are shown dotted. 


out first. Some typical 


For a fixed y, the boundaries S$’, BU and T7" are determined by the values of 0 at the 
corresponding points in Fig. 1; the values of cos h are 1, 1/,/2 and 1/43, respectively. The 
boundary SBA is determined by the value of 0 on the arcs P.Q, on Fig. 1, BT’ by the value 
of H at Pi and AT" by the value of 0 at Q;. Thus, on 
S'BA: cos = (y2—1) cot ay, 
Br’: cos 0 -- 2 sin f = cot ,. (6-6) 
AT’: cos h I sin O(cosg + sing) = cot . 
cos? cot . 
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The dotted curves like hyperbolae drawn on Fig. 3 are curves of constant a d 
by (6-1), and the change of variables implied by (6-2) means that the integration 
to f is along these curves on the diagram. The limiting values of / are determined 
intersections of the dotted curves with the solid boundaries drawn on Fig. 3. After 
calculation it is found that at 


X, on SBA: = ix, 

Y,on BT": f= In- amin t, 

Z,. on AT’: f = arsin [t cos (Iz 4- y)], 
T, on TT': f = arsin (t/,/2), 

U,on UB: f = arsint, 


where ein y [(/2+ 1)? 1]/4 42t. 


Clearly there are three ranges of æ to be considered according as the curve a = coni 
intersects S'B, BA or AT" when the subscript rin (6-7) takes the values 0, 1 or 2, respec in 

The required integrals are all reduced in much the same way and only the case w 
the curve æ = constant intersects S'B will be treated; in this case 1 


o Stan q / / / f, or 0<a<41-88°. 
Veing Fig. 3, (6:5), and (6-7), the required double integral is written down and the inte 
tion over ¢ carried out first. Then the term arcos (/ sin £) arising from this integrati 
immediately replaced by an integral again. Omitting the factor (288/7?) sina, the T 


at this stage is : d Em 
ef arsin (t/ /2) a-f a 0 id 


arsin (£j /2) 

Finally, inverting the order of integration in the second integral and evaluating gives! 
iT 

vent | arsin (t cos à) dø. 


The same result (6-10) is obtained for the next range of a, but a different result is obtail 
in the third range of a. When c is measured in degrees, the required density function ii 


Pl) = (8/5n) ina [ayat [" arsin (boos) a9), 
when 0 € x € 45? and 
f dT T i 
pla) = odo sina Jone f’ artin ($008 9) d$ -l [^ ^" uin (cos d) d. | 
my 
when 45° < & Sarcos $ = 48-19? and where 


t = tan ga, 
ein: y % 1—1¼4% 4 


T pra; 
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For small a, equation (6-11) gives pía) = (vl eben, in agreement wh the estimate 
made by Mackenzie & Thomson (1957), 
Hoth the density function and its cumulative distribution funetion have been computed 


and the results are given in Table 2 and Fig. 4. The density function and ite first derivative 
are continuous at a = 45°, but the second derivative is discontinsons there. The dots on 


Table 2. Distribution of angle Min (100) 


Min (100) 

density when the angle 
€ E ion forthe Min (100). The ordinate is probability 
Fig. 4. mes reu e angie dote are estimates derived from random sampling. 


i function obtained from the 
Fig. 4 give a graphically smoothed estimate of fie SUI de h 
random sampling calculations and again agreement with the true density — ns 
better than would have been expected. The mean, standard deviation, median and 


mode were calculated to be Z 23104, 
o = 10:312", (6-13) 

amea, = 23°183°, 

aom = 23-308". 
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The integrals in (6-11) were evaluated by direct numerical integration while the integral 
in (6-10) was calculated from the power series 


ir 2 
v f?” amin ¢008 9) dp = Sas tn, (ou) 
where 4,7 l, a,= 5/36, a, = 43/800, a, = 177/6272, 
a, = 2867/165,888, a, = 11,531/991,232, (6-15) 


an = 92,479/11,075,584, as = 74,069/11,796,480. 
In the neighbourhood of æ = 45° the behaviour of the density function is given by 
pla) = p_(a) = 0-002896 — 0-0027632x —0-000070z2, (616) 
for x = 45 —a negative and 


P(x) = p,(x) = p. (a) 4 000110723 + 0-000018z1, (6:17) 
for x positive. Near the limit of the distribution at x = 48-19° 
P(x) = 0-000217 (a — 48-19)*, (6:18) 


in agreement with the behaviour predicted by Mackenzie & Thomson (1957). 


I wish to thank Mr D. C. Handscomb for communicating to me the final result for the 
density function for the angle of disorientation. 
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CONDITIONED MARKOV PROCESSES 


By W. A. O'N. WAUGH 
The Canberra University College 


1. Ixrnopveriox 

The problem that we shall consider can conveniently be introduced by means of the 
following example. Suppose a particle performs a simple one-dimensional random walk, 
making steps at successive discrete epochs of time, its possible positions being represented 
by the positive integers and zero, Suppose that zero is the only abworbing state, and that 
when the particle is at any point j (> 1) it has probabilities p to move to j +1 and g ( 1 — p) 
to move to j — 1 at the next epoch. 

If the particle starts at the point 1 at time 0, and if p» g, then it is well known that the 
probability of ultimate absorption in the state 0 is g/p. We shall — —— — A. 

A finite uence (c, ...,z,) of non-negative integers, m, = l, can represent 
successive posti of the particle at the first n + 1 epochs, and we shall call it a path of n 
steps. Consider a path in which there are r steps to the right and n — r steps to the left, and 
in which 2 0 for j = 0, 1, ..., n. Then z, = 2r —n- 1, and the probability that the particle 
performs this path and is ultimately absorbed at the origin is 


Plro , u, A} = N y 
. 
Hence, the conditional probability of the path on the hypothesis of ! z 
Pq, , 4, | A} = p. 
r i same for 
This will be seen to be identical with the unconditional aue yog Ps ee 
a particle which executes a similar random walk but has the roles i — Hi samo inter 
i.e. moves to the right with probability q and to the left with pro * 


change can easily be shown to hold if z,, = for some m ae aking random 
This result can be deeoribed in more general terms ss of per — 
defines a measure Pl. ] over the space of sequences fzo, 2, ...] mille event of peo- 
paths are the finite-dimensional measurable n ree defines another measure 
bability g/p. The second random walk, with p and q — 
which we can denote by P{.}. Then if E is a measurable even 11) 
P{E | A} = PU. Wk 
‘ Markov process is given, 
We shall consider the following generalization. = na b PL) For con such 
having a discrete state-space. Denote the measure it P 2 s "4 set of states can 
Processes events which can be described as absorption in Markov process which defines 
be defined. Call such an event A. We shall construct a so that the relation (1-1) 
a measure P(.) over the same fundamental p eed Bio a second Markov process 
holds between the two measures. In other woe; d dam probabilities of the original 
whose (absolute) probabilities coincide with the con 
process, Biom. 45 
16 
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2. NOTATION AND PRELIMINARIES 


The result and proof will be given for Markov processes having a continuous time parameter 
and a discrete state-space. The analogous result for discrete-time processes will easily be 
seen. We recall some points about stationary Markov transition matrices, which are matrices 
whose elements gt) satisfy the conditions 


Polt) > 0, 
UP) =1, 


Pix(8+t) = Z pals) Pixlt), 
for s,t>0 and for i, j,k = O, 1, 2, .... 

If such a matrix is given, together with a probability distribution p;, where i = 0, 1,2, ..., 
which we call the initial probability distribution, a measure is defined on a suitable funda- 
mental probability space Q. The space Q can be taken as the space of all real valued func- 
tions of t, or as the space of all non-negative integral-valued step functions of t, where f 0. 
It is known (see, for example, Doob, 1953) that almost all sample functions of such a process 
are of the latter type, so, without loss of generality, we will adopt the more restricted 
fundamental probability space. We shall occasionally refer to the state-space which con- 
sists of the non-negative integers. 

For such a process it is also known that if Dult) > 9;, as t0, then the limits 7;; = us Pyl) 
exist. me 

Tn terms of these limits we establish the following classification of the states. If 7,20 
then state j is called positive, otherwise it is called dissipative. The positive states are those 
which are recurrent with finite mean recurrence time; the dissipative states include the 
transient states and the states which are recurrent and null. 

The positive states are further subdivided into disjoint positive classes C", for 
p = 0,1, 2, ..., where j and k belong to the same C^ if and only if 1% 0. These classes are 
closed, i.e. if the system enters a particular C? it cannot subsequently leave it. 

There are a set of numbers w(i, C^) defined for each state i, and for each positive class C?, 
such that 

O S, C) « 1, 
p if i e Co, 
w(i, Co) = 
0 if ie Ce, where c 3 p, 
eo 
à Pi!) v(j,C^) , Oe) for t»0. 


These numbers w(i, O^) may be described as the probabilities that the system, starting 
from state i will enter, and thereafter remain, in the class C^, However, we cannot adopt this 
description until the event to which it refers has been defined as a measurable set in the 


sample space. 
3. MEASURABILITY OF THE CONDITIONING HYPOTHESIS 


Suppose (Q, &, P) is the probability space, where Q has already been described, F is the 
Borel field of measurable subsets of Q, and P is the probability measure generated by the 
Markov process. Let a) be the co-ordinate random variable at time t. Then the set 


X= (v: a € C^ for all sufficiently large £} 
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will not be F -measurable, because membership of it imposes restrictions on x, for more than 
countably many values oft. To define a measurable set which can be identified with 'absorp- 
tion in one of the positive classes’ we have to modify the fundamental probability space. 
We require some results concerning sets which are thick in (Q, * P), which we shall 
summarize here. Details may be found in Halmos (1950). Consider any measure space 
(Q. K and denote inner measures by ue. A subset Q, of Q is defined as being thick if 


HS (E — Q4) = 0 


for every measurable set E. It follows that if Q itself is measurable, which is so in a pro- 
bability space, then Q, is thick if and only if ji, (Q — Q,) = 0. 
The theorem about thick sets which we use is the following. If Q, is a thick subset of 
a measure space (Q, F, u), if &, is the Borel field of fall sets of the form E ^ Qy, where E e F, 
and if a measure of such sets is defined by n Q) = ), then (Qg, Fo, ji) is a measure 
space. Let 
Qo = (e: for all s,t> 0, and for each p, , (h € C^, , (o) e C^]. 


Consider any set E (€ F) such that E c Q— Q,. E must consist entirely of step functions 
w which take a value in some C at time s and take a value outside that C^ at some later time, 
it being understood that any w which never takes a value in amy C belongs to Qg. Since 
transitions out of C? are impossible, P(E) = 0 whence P,(0— % = 0 and so (2, is thick. 

Applying the theorem we can make Q, into a measure space which is a probability space 
since (c) = 1. Now let X, = X n Qo. Then 


X, ? (o: ) € C? for sufficiently large sj 
colo: for each p and for s,t> 0, x,(w) € C^, ) € C) 
z (o: z,(w) € C? for some integer r> Yn Q 
= Yn, say, where YeF. 


Thus X, e.Z, (i.e. is H. measurable) and we identify this event with ‘absorption in pe . 
The argument is only changed in detail if the event is to be “absorption in either C^ or C 
or more generally in any collection of the classes C^. ^ : 

The modification that has been made in the fundamental probability space can be simply 
described: from the original space we have removed all points which nt. system 
as entering and subsequently leaving any of the closed sets. From now on wes 11 00 
that this modification has been made, but for simplicity we will drop the suffixes and deno 


the space by (Q, F, P). 


OF THE CONDITIONAL TRANSITION MATRIX 


T CTION i 
NC C, and we shall write 


We shall confine our attention to just one of the positive classes, say 

for the probability of absorption in C, starting from state i 
u; = li, C). 

We shall refer to absorption in C as the event A. As before, the generalization to several of 

the positive classes is a matter of detail. 0 


DT. all 
Let F be the subset of the state-space consisting of i dissi- 
as a consequent. In other words, F consists of all the E mmt 


pative states from which it is possible to reach some state of C. £ 


Define a matrix and a set of initial probabilities as follows 


2000 = e (ie F), 
v ng (ig F), 


o 
and : P. = pru D Pju; for all i, 
j=0 


whence P. = 0 for ie F. 


This matrix is stochastic because the absorption probabilities satisfy the relation 
LÀ 
. = X ups(f) for all t>0. 
j-0 


Thus, the matrix is a stationary Markov transition matrix and, together with the initial 
probabilities Pn, it defines a Markov process and a measure on the space Q which we shall 
denote by P. It remains to show that for any measurable set E c Q, 


P(E|4) PIN. 


Let 0— t, «t, « ... <t, be a finite set of parameter-values, and let ao, 41, ..., a, be a set 
of integers in the state-space. Define a set S,c Q by 


S, = {w: %(w) = dp, ..., a, (0) = ap}. 


Then the measurable, finite dimensional subsets of Q are all of the form of S, and if the 
two measures agree on these subsets they are identical. There are two cases, according as 


there is or is not some 4; among ap, ...,a,, which belongs to the complement of F. If aef 
for = 0,..., 1,n then 


PS,) = Pay Pa "n to) DX Har ll a , —1) 


* 
— Su, — Pao ailli 10) a Za, an (5n —iy) 
ao 


1 
T FIA) Pa, Pag altı 100 Pan T Stai) Uan 
7 P(8,, A}/P{4} 
= P(S, | A}. 


Tf, on the other hand, a,¢ F for some J 7 0,1, ...,n, then it can be shown that 


P(8,) = P(S, | 4) = 0. 


Thus in both cases the two measures agree on the finite dimensional subsets, so the result 


244 Conditioned Markov processes 
Suppose that, among the initial probabilities, p;> 0 for at least one ic F. Then 
P(A) = Y, pu, 0. 
i-o 
is proved. | 
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5. THE PROBLEM IN TERMS OF INSTANTANEOUS TRANSITION PROBABILITIES 


A Markov process is frequently defined not by giving the matrix [p,,(t)] explicitly, but by 
giving a matrix Q of constants q;, which satisfy the conditions 


Oxq,«o, when i+j for i,j =0,1,2,..., 
02442 , 
€ — li 
P Tii 


Much attention has been given to the problem of determining further conditions on Q 
under which Kolmogoroft’s differential equations 


put) = pull) Vk (51) 
put) = A0 nl, (5:1a) 
with the initial conditions lim Bult) = pg(0) = 95. 
t> 


possess a unique solution which defines a Markov process. We shall suppose that such con- 
ditions are fulfilled, in that Q is conservative and regular. The conditions for Q to be con- 
servative are that all the q;; are finite, and that Eu — 0 for all i. For conditions ensuring 


regularity we refer to Feller (1940) and Kato (1954). A method of calculating theabsorption 
probabilities u, directly from the matrix Q has been given by Kendall & Reuter (1957). 
Our problem is, given a matrix [q;;] which generates [p;;(¢)] and using these ui, to define a 
matrix dil which generates [p;;(t)]. Almost obviously the definition is 


28 (ieF), 
ide dij (i¢F). 


We merely need to verify that Kolmogoroff's forward equations (5-1), with these d; 
introduced, are satisfied by the pit) defined in $4, i.e. that 


Pix lt) = i S ole) Ie — Z M0 Tin: (5:2) 


First, suppose that i e F. Then the terms for jF are identically zero and so (5:2) is 


J 5:3 
put) = X Vil!) Gre (5-3) 
ger 
On substituting for z(t) and dj, this becomes 
22700) = E 22 0h gne 
1 put) à u Jul ) u; jk 
Provided that ke F this is equivalent to 
„(= t) 
put) p put) Ux 
(5-4) 


ao 
= X put) ans 
j=0 
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replacing in the sum the terms for j¢F, all of which are identically zero. It will be seen that 
(9-4) is the forward equation for the original process, and therefore must hold. On the 
other hand if k¢ F both sides of (5-3) and so of (5-2) are identically zero. 

Secondly suppose that i € F. By similar methods (5-2) can be shown to be equivalent to 
(5-1) for the original process if kg F, and to reduce to zero on both sides otherwise, 


0. APPLICATIONS AND EXTENSIONS OF THE RESULT 
(a) The simple birth and. death process 
We first apply the method to the well-known elementary birth and death process (see, for 
example, Feller 1950, p. 374) where the birth-rate per head per unit of time is A, and the 
death-rate is ji. The solution is similar to that for the random walk described in $1. When 
A> y extinction of the population has a probability less than 1, and to obtain probabilities 
conditional on the hypothesis of extinction it is merely necessary to interchange A and u 
in any formulae derived for the ‘unconditioned’ process. To verify this, note that the 
matrix of instantaneous transition probabilities has elements defined for n >1 by 


In,n-1 nl, 
Inn = N +4), 
in, 141 nA, 


and 4 = 0, whenever j4n—1, » or n+1. 


It is known that the probabilities of absorption at zero, or extinction, are given by 
u; = (si for j = 1,2,... and hence, applying the result of $5, 
Al = ( ( ny 


` nal, 
with similar results for the other 9%. 


(b) The general M. arkovian birth and death process 


The general Markovian birth and death process can be specified by the matrix whose 
elements are 


Qn,n-1 = flys 
Inn == (A, +n); 
Innit = An, 
and Inj = 0, whenever j+n—1, norn+1. 
Consider the series l 5 4 zs 


141 „ 4% 
Kendall & Reuter (1957) have shown that the 


l probability of extinction is 1 if T =%, 
and that if T' «oo the extinction probabilities are 
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Hence e zi 
ence -iu- db: (i,j J). 


- 


The relevant values in forming the modified process are wu, ,,/w, and u,_,/u,, and we can 
write these in a convenient form by defining the series 


T 214 $ Hj pu 


ael Aj Aga ca 
Note that T, = T. We shall make use of the recurrence relation 
em -T,,-l where j>2. 


In terms of these series we obtain 


Unti _ Tnyi—1 


Un Tia : 
1s E” T 
and cm 7-1 1 


We can now define a new set of parameters I, and n, which give rise to an instantaneous 

transition matrix whose elements are 

Tig = (ug ui lij 
as required by our construction. 

. 
We put A. = ES OM Àn 
n+l 
15 An As 
Tari Aja 


T, 


and fn = 


1 ^^ 


T- 
Tk 
T 
In each of these the second form is obtained from the first by means of the recurrence 
relation. Since qn, n+1 = An it is clear that 


Qn 41 = = (u, /) Innt and similarly 44,41 = (us afta) da, ni 


To verify that Gan = gun we require that Ant An = An Hin · Now 
AA = An (es 210 


Tia +1 


= An —1+T, 
= si (La n) 
-1 


- irs 
Tati 


bad An n 


An 
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Thus the parameters A, fi, for n = 1, 2, ..., give rise to a birth and death process whose 
probabilities agree with the conditional probabilities for the original process, on the 
hypothesis of ultimate extinction. 


(c) T'he discrete-time chain reaction 


We now consider the process which appears in the Galton-Watson problem of the 
extinction of surnames, in the nuclear chain reaction, and in other contexts. Tt is described 
by Feller (1950) so full details will not be given here. A particle, or individual, in the nth — 
generation has probabilities (g,: k = 0, 1, to give rise to k particles in the n+ Ist gen- 


eration, and particles reproduce independently of one another. If m kq; » 1 extinction has 
a probability less than 1, and we suppose that this condition is fulfilled. 

Let Q9) = X ast. 

Then the probability of extinction, when the population starts with just one individual, is 
given by that root of t- QD 


dis unique. Ifthere are J particles in the population 
subsequent extinction is % = C. 


which is less than 1. Such a root exists an 
at a given generation the probability of 

In this problem it is of interest to find a distribution ] for numbers of progeny of an 
individual, which will give rise to the modified transition matrix Pi; (for discrete time) 
and hence to the conditional probability measure. Such a distribution is defined by 


d = oq, (K =0,1,...). 
Note that the generating function of this is given by 


Q(s) , 


whence 00) = 1. In view of the values of the extinctio: 


n probabilities u, we must show, 
for the elements of the transition matrices, that 


Py = Ẹ Pij 


Now pj, is the coefficient of in O), while Ju is the coefficient of s! in 


[Q5] LO 
whence the required relation follows. 


The mean number of progeny for the distribution 
follows from Feller’s discussion that this is 


Yaglom (1947), Hawkins & Ulam (1944) 


0 
chain reaction when Y kq < 
k=O 


(iJ is given by Q'(1) = Q'(&), and it 
strictly less than 1, 
and Otter (1949), have given theorems for the 


1, some of which are conveniently summarized by Harris 
(1951). Our preceding remark shows that those theorems which apply when » kg, < 1 can 
k=0 


be used to obtain probabilities, conditional on extinction, when 5 kqy 1. 
ke 
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(d) Further remarks on the theory and applications 

The ‘interchange of parameters’ result (a) for the simple birth and death process has 
been applied by Kendall (1956) to the study of the ‘threshold theorem’ for epidemics, It 
seems appropriate to mention here that the general problem arose out of an investigation 
of the special case of the birth and death process, and to thank Mr D. G. Kendall for his 
suggestion of the original problem and for a great deal of other valuable advice. Dr G. S. 
Watson suggested the application (c) to the chain reaction which arose in a current in- 
vestigation of a theory of the size of large molecules, 

The simple birth and death process described in (a) does not reflect in any way the lives 
of individual particles, and models involving some space of 'trees' are more suitable when 
problems like that of the age distribution are considered. Details of such a model will be 
given elsewhere but it may be mentioned that the ‘interchange of parameters’ result holds 
for the age distribution. 
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MISCELLANEA 
Ranking means of two normal populations with unknown variances 


By RITA MAURICE 
University College London 


on the basis of a single sample from each population, it is desired to rank the means of two normal 
pulations with a given probability, P, of a correct ranking when the difference between the means is 
knowledge of the population variances is required to determine the size of the samples on which the 
nking is to be based. Stein (1945) put forward a two-sample procedure for testing a value of the popula- 
n mean when the population variance is unknown, an initial sample of size 74 being used to estimate 
e variance and hence to determine the size, n, of the second stage of sampling. This type of procedure 
is adopted by Bechhofer, Dunnett & Sobel (1954) for ranking means of normal populations when the 
riances are in known ratio. For the special case of two populations, similar procedures may be used 
1en the variance ratio also is unknown and nothing is known about the variances, , g’?, 

The problem consists in determining the value n, 80 that the correct selection is made with probability 
(or greater) when the difference between the population means is 1 - = 0790. We shall write 
a (i = 1,2, my, s N 4-4) for the sampled observations from the two populations; z,, z for the 
dans of the observations in the first samples of 75, X, Z' for the means of the combined n = n, +n 
servations, 

The first procedure considered consists in (i) taking a sample of n, from each population; (ii) using 
e samples to estimate c? -- 0’; (iii) from this estimate determining a value u; (iv) drawing a second 
mple of n; from each population; (v) choosing as the population with the larger mean that for which 
e total sample of n; +n, observations has the greater mean. , 

An estimate s*, of g*-- may be made by pairing at random observations from each sample, as 


zgested by Bartlett and mentioned by Welch (1937, P. 360) for the problem of testing the difference 
tween two means. Thus we may take 


ni 
82 > (v — xi — 2, +71)? (m — 1). 
Pos 
, N: 
ternatively we could take st = $ ( zi — 2, —24)*/(n, — 1). 
i=1 


‘iting h for the percentage point of the distribution of Student's ¢ having n, — 1 degrees of freedom 
ich satisfies Pr (6,17 —h} = P, n, is determined from the relation 
n= +N = max (n, [s*h2/5*] I), 


ere [s?^?/8?] indicates the largest integer less than 8*h?/5*. Then if n, i i 
i is th f the d le 
be drawn from each population, a ranking of the populati E — 


ki ata : on means corresponding to the ranking of 
pt means E, 7 satisfies the required probability condition. That this is the case may be seen as 


"he probability of the population with the larger mean having also the larger sample mean is 


an d pad} = Pr l T. (I -u 
Pr (- 0 2n aide > nj] 


= Pr(t, 1> -s. 
since n has been determined so that à /s l, it fi ility i 
a ROO ENTE Ve x h, it follows that the actual probability is greater than 
nother possibility is to take initial samples of size n, fro i 
J ; s 1 from each population and to make from these 
e Penge) 85, 8"? of Sn, g^. This was considered by Chapman (1980) for lasting te ilo Ble. 
en take further samples of size Na Na, respectively, from the two populations such that 


n = m +n, = max {n,,[s%h?/57]41}; n. = m+ = max (ui. [52/2/52] + 1). 


-— | 
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For this procedure the chance of a correct choice is given by 


1 2 
Pr(z-u—Z'u'»—8)-— —| ehde, 
aR ig co cM 
here T = ó/(?|[n + o’? /n'}. This integral of the normal function is an increasing function of n and n’. 
hus, a lower limit to the value of the probability of a correct choice when = = 6 may be found by 
garding the »'s as continuous variables and putting each equal to its minimum possible value, i.e. 


An, = Sh|B, nj = s'hj8. 
120 


o 
o 


œ 
e 


60 


40 


20 


Sum of expected total samples, E(2n) or E(n + n") 


20 40 60 80 100 
Sum of initial samples, 2n, ; 
ô= 0-40. —-—-, Ratio of variances known; 


, separate estimation of q, q^, ^ 


hus, the actual probability is greater than (or equal to) 
12 fece. i». p) Ji i). 
8 


| i i i dent t variates 
he left-hand side of this inequality is distributed as the difference between two independen: 


b Pe yi qe ier n 
ith n, — 1 degrees of freedom, and h is therefore the percentage point iieri ions 5 isd 
tudent's t, Percentage points of the difference of two C termed 7 5 — 1 
alculated by Sukhatme (1938) and his values for & = 45° may 2 a br F 57 
r small numbers of degrees of freedom have been prepared by 13 e eee 
Expected sample sizes for these two procedures have been caleu! A . . 
Y Seelbinder (1953) which assume that, if a second sample is Cc cna OS m 15 Wun (ead the 
ontinuity of nd. Some of Seelbinder’s tabulated values for n have 
sults are shown together with the results of similar calculations 
954) when the variances are in known ratio. 
thor in the ratio 3: 1. 3 
The figures show that the expected sample sizes 
tio. This is presumably because the probability 
arianco of the difference of the sample means, Gn + 


i i the variance 

not much increased by ignorance of n 
ofa correct choice, given d, depends on the sampling 
n. For single samples of a given combined 
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size n+n’ = N, say, this variance is the same whether n= n’ = EN or n/n’ = 0?/o7*, Thus, the only 
advantage of the Bechhofer, Dunnett and Sobel procedure (in which n/n’ = g?/g"?) over the random 
pairing of the sample values (in which n = n^) lies in the greater number of degrees of freedom in esti- 
mating g? and determining h. This advantage is greatest when n is small. Random pairing of sample 
values gives better results than considering each population separately especially when the variance 
ratio is large. However, the latter procedure has the advantage of extension to the case when more than 
two populations are being considered. 


160 


agere 2 Single sample, 
cur eee known variances 


Single sample, 
optimum allocation 


Sum of expected total samples, E(2n) or E(n +n’) 
B 
eo 


è 


40 80 120 160 200 240 
Sum of initial samples, 2n, 
Fig. 2. Expected total sample sizes, P = 0-95, 302 = 0^, 0 = (-4c. —.—., Ratio of variances known; 
random pairing of sample values; ————, separate estimation of o*, O2. 
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Non-randomness in a sequence of two alternatives 
II. Runs TEST 


Bv D. E. BARTON ax» F. N. DAVID 
University College London 


It is assumed that there exists an infinite population composed of two charaeteristies in the proportions 
p and q. A sample of r is randomly drawn from this population, the elements being laid in a line according 
to the order of drawing. Under the null hypothesis the alternation of the two characteristics along the 
line will be random. Under the alternate hypothesis it will be supposed that the selector is influenced in 
the choice of the (v + 1)st element by his knowledge of the vth element. Such a situation might arise if, 
confronted with a very large number of photographs of men and women and asked to pick out 7 in 
ascending age order, the selector tended to choose a photograph of a women if the one previously selected 
was also a woman. We have previously described this situation as one of persistence of type. 
Let the suffix i denote the ith ranking position and denote the two characteristics by z and y. Then 


P(r)-p-21-q-1-P(y) (i212...) 
If in r drawings there are ri us and rẹ y’s the probability distribution of 7’, the number of runs of both 
alternatives, is, under the null hypothesis 
1-10, nO, pig? — 20710, 471014 
P(T = 21 ru. rg Hy) = 2 cue c oH 
and 
(AC 110,4. 30,530, ,) pg El 
— — ͤ eS ee M 088879 
1C, png Cr, 
This result is possibly due to Whitworth (Choice and Chance, Problems 193 and 194) but was probably 
known before him. 
Under the alternate hypothesis we shall 

Markoff chain. We suppose the probabilities 

Pl. i = p+, Plz = 1 -0) 

Pv, i = I-). PH d = q*pÜ (i 2, 3, .). 
It will be noticed that Pix; | £i} Py Jaa} =1 


= P(z, ] Phar, | tea} + Pye} Plo | ed = P 
the non-null case, i.e. when 0 +0, is the 


P(T = 2t+1 | rira Hy) = 


assume that persistence of type can be described by a simple 
for the single event are unaltered but that 


as expected. Under these assumptions the distribution of T in on 
equilibrium position of that described by David (1947). We have, writing 


1-0p 47 pe (1 +0)t+ (1.—0) (rpa - Grp? rs(a 0) | 
57 Xli Rum = z] ieee ‘orf {1—0)(p+99) (q+ P9) 
2 1 pall — 0)? t 
"i PUD =2t| rre Hi) = 77050703175 e yg+p0) |” 


1 I ea 
and P(T = (2t + 1) |ru T Hy) = p" "Cx (54-90) (a+ PO) 


—p)0 pq1-0?» T 
x[ma +6) +04 ; (p++ p) 
i already been discussed b; David (1947). When 0 = 0 the dis- 
huis ae varying ate Jo are 8 8 rf is 8 the critical region will be 


tribution reduces to that for Hy; the bounds fo N Bot 17 
the lower tail of the distribution under Ho; when it is negative the upper ia Pag Ad E 

The complete description of any sequence is given by the number o 5 eee tient Daor- 
two compositions of r, z's and r, y's, together with the information as . 
vation, When r, and rs are decomposed into the same number of ope dd iia o 
number of runs. When the number of components 1 by 1 the total num fone 
Specify the two compositions by (an- , dz), (Day +++» bi), where 

i = Xa, 71 Eb; and 1-1. 
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Let a be a characteristic random variable taking the value 1 if the sequence starts with an x and zero 
otherwise. Various functions of the compositions have been used as test functions for randomness in 
the sequence, generally conditional on r, and rg. 

Now the probability of any sequence of events (each of which must be either z or y) of specification 
[ri ra {a,}, {b} æ], under the null hypothesis of a sequence of r independent trials is p'q® and is in- 
dependent of the other variables in the specification. This probability is (4)' if p = q. Under the hypo- 
thesis H, for persistence of type, but with the additional specification that p = 3, the probability of 


any given sequence is (y 1 0e (1 4 ee, a) 


that is to say it depends just on 0, r, and k+ 1 (the total number of components of r, and 7,). We have 


called T' = k+ 1 in the preceding paragraph. It follows that under these cireumstances, for a fixed sample 
size r, we have that T' is sufficient for 0. 


We note that, summing the probabilities of each sequence over all sequences of the same value of T, 
and writing 7(z) to denote the probability generating function of T' under the null hypothesis 


1—-0|T 1-0 
vas ng = (19) pers no [e (15) 
It will also be noted that this result is true if we consider the distribution conditional on r, and ry since 
these do not enter into (1), and we may therefore consider a subset of the sample space to obtain 


1-07 1-0 
BUT rir Hi) = 1x6 P(T |, 7,; Ho) / Ts 146 71, 72 
where u | rj, rz) is the null hypothesis probability generating function of T conditional on i and ry. 
ing 0 positive, values of the power function for two different Sequences and three different values 


of @ are given in Table 1. The critical region was made exactly 0-05 in each case, for purposes of com- 
parison, by taking a proportion of a frequency block. 


Table 1. Powers of T, S and b under the alternate hypothesis of dependence 


| | 
r * fs 0 | 1/4 3/5 7/9 0 
| | 

10 5 | ul T 0-178 0-587 0-842 0:05 
S 0-126 0-363 0-593 = 
b 0-055 0-175 0-288 -— 

10 6 4 | p | 0-178 0-587 0-842 0-05 
S | 0-124 0-362 0-593 — 


For sequences of reasonable length and r, and r, not very different it is possible to assume normality 


for T under Ho. The distribution of under Hi may, for values of h not very different from 0, also be 
assumed normal, but the mean and variance will be different. The moments of 7’ under the alternate 


hypothesis, when p = J. can be found by using the same device as in our earlier paper (Barton, David 
& Mallows (1958)). Let k,(9) and x, denote the vth cumulants under the alternate and the null hypothesis 
respectively. If 


8 = log [(1 —0)/(14-0)] 


then K (9) = vH a kato 
and in particular Ky(0) Kd. Ky + 48%. 4 +459. c., 
KO) 3 Ka +8. 4 + 202. , 
K(0) Sky ＋ 0. Kn, 
K(O) K.. 
Since the distribution of 7 under Ho is quickly normal with increasing r so that x, (v > 2) tends reasonably 
quickly to zero these expansions should be ad 1 


equate as regards order for 0 small. The factorial moments 
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of T under the null hypothesis are of reasonably succinct algebraic form (Barton & David (1957)), but 
the central moments and cumulants do not appear to reduce easily. The first four cumulants under H, are 


21 1 27 
ia eya, e (-i). 


| _ 2r,n (l6riri 4r,r.(r+3) 
er) ( re liye +3), 
V, nn 48(5r—6)rir} 48(2r?--3r— 6) N72 
1799 a r3. 
4r? ＋ 49r — 97r — 18 
— —— n- uo. 
When r, = 7r, the above reduce to 
ori | r(r—2) " —r(r— 2) (r* — 4r-- 6) 
nct A acp Be Un LEE» 


l 1 
m= ips 


Table 2. Bivariate distribution of b and T for a sequence (7,3) 


T ELM EE e 
b 2 3 4 7 Total 
0 1 2 4 10 
1 Et. 2 8 10 50 
2 A 2 8 10 50 
3 1 2 4 10 
Total 2 8 24 20 ub 


For 0 < 0-6 a satisfactory agreement was found between the true mean and variance of T (as found from 
the caleulated probability distribution function) and from the series expansion or the cumulants. This is 
not, however, true for the x,(0) and the «,(0) of 7, and it is clear that for the series expansion to be useful 
in these two cases higher moments of I in the null case must be calculated. To cut the series short, as 
has been done in the previous section, will be adequate for ca) and x,(0) only when @ is, very approxi- 
mately, «0-2. In this latter case the distribution of T, will be approximated to by the normal distribu- 
tion. For large values of 0 the distribution of T' is J-shaped and in order to approximate to it by (say) 
a Pearson curve, the series expansions for x) and x,() will need to be extended. E 
In our previous paper the powers of S, the sum of the ranks of one characteristic, of b, the number o 
one characteristic below the median of the sequence, and of T', were compared under the same alternate 
hypothesis, Using the same arguments as we set forward there, we may compare the powers of S and T, 
and of b and 7, under the dependence alternative, using the bivariate distribution. The arrays 05 E 
distribution of & for T fixed are weighted by [(1—8)/(1 t and then added for T keeping S 4 
The critical region for S under the null hypothesis is the sum of the two tail areas. The power of & to 


detect 0+0 is given in Table 1. i ivari 
The joint distribution of 7 and b may be written down in explicit algebraic form and the e 
table constructed. For the sake of illustration we given the bivariate onere iu oie id th 
When the total number in the sequence is even it is possible to make a dichotomy at uom Hs akon. 
table is symmetrical about &(b). It follows that &(b | 7’) is constant. S eed = Fg "1 5 hod thé 
is odd, we make a dichotomy between the Rth and the (R+ 1)st observations (R= Som Dd pu 
symmetry of the table disappears. The regression of T on b is Ives idi e " Bak. 
may be found either from the joint probability distribution function or ** d T, the 
tions. Let T, be the number of runs of both characteristics below the point of dichotomy and £, 


number above, Then T-T,-T,-a, 


R r—R 
b(r,—b) +(R—b) (r—R-r, +b) 


&(a|b) = R(r-H) , 


whence, on substitution, 
1 
e(T|b) = dr ia ex nn pe 1) —b(r—2R + 2r,(2R—1)) + Rr,(2r, 1)}. 


The maximum value of &(T' | b) will be when 


1 
b= ar-n* 2R--2r,4(2R— J)), 
which for a median dichotomy of an even number of observations reduces to b = gr. The regression of 
T' on b under the alternate hypothesis is approximately, following the argument already set out, 
EIT |b, Hy) = &(T | b, Hy) +8. 05 (Hy), 
and may therefore be calculated once o3. Ho) is found. This second moment will be of the fourth power 
in b, but since we do not need the regression under Hi we have not calculated it. 

The power of b under the alternate hypothesis can be found in precisely the same way as we have put 
forward for calculating the power of S. Because b can take few values in a short sequence the power has 
been found for a (5?) sequence only. The critical region, the sum of the two tail areas of the b-distribution, 
has been forced to be 0-05. Tt will be noticed that b appears to be of little value to detect dependence 
in a sequence. For the moments of b under the alternate hypothesis we have not been able to find simple 
expressions. Since the denominator of the general expressions is the probability density function of the 
number of events in a Markoff chain of two alternatives, and this has not yet been found expressible in 
terms of elementary functions, it seems unlikely that the moments of b will be tractable. 

Tn a previous paper we put forward an alternate ranking hypothesis for which S was a sufficient test 
statistic. In this paper we have put forward an alternate hypothesis against which T is a sufficient 
statistic. The powers of S, T' and b have been compared for each model. 
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Note on multiple comparisons for adjusted means in the analysis of covariance 


Bv MAX HALPERIN* anD S. W. GREENHOUSE} 
National Institutes of Health 


l. INTRODUCTION 


The analysis of covariance, in the simple application to a one-way classification, deals with the problem 
of comparing k-class means of a variable y in the presence of a covariate z. If we observe 


(Ja- un), (Je- ). , (Yinp Zin) 
in the ith class, the usual assumption 


(using & to denote expected value of) is that Syy = a, + bry 88 
opposed to the customary situation in 


the analysis of variance where £y; = a,. Since the mean of the 


* Division of Biologies Standards. 
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where = 1 if the last element below the point of dichotomy and the first above are of like characteristics 
and « = 0 otherwise. The conditional expectations we write as 

EIT |b) = &(7, | b) + &(T, | b) — 4(a | b). 

It is immediate that 8 A x ES 

R-b 2(r, —b) (r— R— (r,— 

S(T, |b) = 14.2489 ATA = Feet FO) 

T National Institute of Mental Health. | 
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ith class y,, has the expectation a, 4- bz, , then clearly, if r. & E,, 4j, differs from 4y, even if a, = aj 
One therefore asks whether the k-class means are equal after adjusting all the observations to a 9 
x, say F., the grand mean of the zy. 

This question is ordinarily answered by computing the appropriate F-test (Snedecor, 1956; Anderson 
& Bancroft, 1952). The arithmetic of this test is as follows. calculating the least-squares estimate 
of b from within the k classes, assuming there are class differences, one computes the pooled deviations 


: ; k 
from regression to yield an error mean square with È n,— k— 1 degrees of freedom. One next computes 
i=l 


k 
the deviations from a regression line fitted to all © n; observations and subtracts from this the error sum 


t=] 
of squares previously computed. The difference so obtained leads to a mean square between the k classes 
with k— 1 degrees of freedom. The ratio of this latter mean Square to the error mean square is the 
computed F. 

Now, it is usually the case that we are less interested in a test of homogeneity of the adjusted true 
means that in confidence intervals on contrasts among them. Snedecor, in the latest edition of his 
Statistical Methods (1956) suggests the usual confidence limit procedure for comparison of two groups; 
for comparison of pairs of adjusted means when there are more than two groups, a sequential test of 
pairs of adjusted means, discussed by Hartley (1955), is proposed. This latter procedure controls errors 
in the multiple comparison sense for testing all pairs of adjusted means. It seems worth mentioning in 
passing that the application of Hartley’s procedure in the comparison of adjusted means appears in- 
appropriate on two counts. In the first place, Hartley’s procedure assumes independence and homo- 
scedasticity of the means being compared; secondly, his procedure assumes equal sample sizes in the 
various classes being compared. The general situation in the comparison of adjusted means fails to meet 
either of these requirements, Thus, results of the use of the Hartley procedure in this application should 
be viewed with some reserve. 

The main purpose of this note is to point out the apparently unrecognized fact that Scheffé's work 
(1953) on multiple comparisons is immediately applicable to testing and obtaining confidence intervals 
for contrasts among adjusted means. Use of the method is conservative if we are only interested in 
comparison of pairs of adjusted means; however, it has the virtue of being precise in the probability 
sense. 

It is well known that, although one thinks of the test previously described as being on the set of k 
adjusted means, y; = Ji. - bà. F. ), the numerator sum of squares reflecting the variation among the 


k ale 5 
k adjusted means is not identically equal to x (yi — V... ^, the sum of squared deviations of the adjusted 
Pes 


means about their mean. This is so because the have different variances and are correlated. It is perhaps 
for this reason that the applicability of Scheffé’s multiple comparison theorem has not been obvious. 


2. APPLICABILITY OF SCHEFFÉ'S MULTIPLE COMPARISON THEORY 
For convenience in discussion we first state the theorem of Scheffé on multiple comparisons in a form 
suitable for this application. 

THEOREM. Let fi, figs , ji, have a multivariate normal distribution such that 
Ehi = pu i= . 
cov (fs, fly) = ayo’? 15 = 1,2,..-,%); 
í G imate of c? distributed like 

Where the constants, ag are known and c is unknown. Let o? be an oe iy 
ox m with m degrees of freedom independently of fij, Jis, .., fèr. Let the ji; and p 


k k 
Shige = hh 
i=1 i=l 
k 
5 if 0 = where the c; are arbitrary 
Where the 4, and A are known constants and PL Then, if 0 P Cipi i 
except that 5 c, = 0 (0 is a contrast) and the rank of the dist ibution of the fi, is (k— I) the probability 
i= 

9 i=] * 2 i 
is 1— o that the values 6, of all possible contrasts, simultaneously satisfy 


0—Q65«0 «0-4 Qj. € 


17 
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A & 
* 7 > €, flus 
i=1 
A k ^ 
= D a&0,601, 
ij=1 
and Q? = (k— 1) F,(k— I, m), 


where H- 1, m) is the upper 100 % point of the F. distribution with (k — 1) and m degrees of freedom. 
Thus, we need only to verify the correspondence between the theorem and the problem at hand. To 


do this we need some notation. Thus, let » 


ni 
8 à P (* ..) Yi 
o 


k m 
D (-..) 


i=1j=1 


, 


y -G. -F.) (= 1. 2. , H 


y F., 
* kon 
SY = X (ysy—4,—bzgy, 
i-1j-1 
4. = J.. bt. (i = 1. 2, , H, 
k m 
Sus = » P (z4 Ft.) 2 
i=1j=1 
Now we let f. = JV, 
Ne à, l | (5 -2.)(5-z. 
and have co (fi; fly) = ra z 4C i. a y.—&..) e, 
17 > ni won 
t-1 


where 6,; = 1,4 = j, ôy = 0, i+); this identifies the ag. S; estimates o? with En; — k — 1 degreesof freedom 
and is well known to be independent of H. jy, ..., yi, and is thus identified with (Zn,—k—1)6*. The fi; 
are restricted by Xn; = O, so that n, = ħi h = 0. It remains only to show that the rank of the co- 
variance matrix of the ĝ;, E say, is k— 1. We have, aside from a constant multiplier, c?, 


1 1 
Z-D-—j T did, 
Xd ts 


where D is ak x k matrix with zeros off the diagonal and n; as the ith diagonal element, j is a row vector 
of ones, d is a row vector with elements d. = . — H., and the prime symbol denotes a transpose matrix. 


We can write, (add 1 
z= oret- ni], 
Soss En, j 


where Lis the identity matrix and (nd) and n’ are column vectors with elements n,d, and n;, respectively; 
further factorization of the matrix leads to 


nd)“ “| n'j 
zD|I.-——— I-—-|. 
= [ 5 5 was En, 
Since D(I + (nd)' d/S,,,,) is non-singular, the rank of E is the rank of (I —n'j/Zn,) which is easily shown 
to be (k — 1). Thus all the conditions of Scheffé's theorem are satisfied. 


It is also obvious that the case of multiple covariates can be treated similarly. The analysis given above 
also holds for randomized bloeks with Sus replaced by 


k n 
Ste2 = à, 1 (2% . 2 LF. ):, 
and 52 appropriately modified. 
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3. AN EXAMPLE 


The following example is taken from Snedecor (1956), pp. 404-6. It is a randomized block experiment 
in which the variable of interest (y) is the yield in pounds field weight of ear corn for six varieties of corn 
and the covariate (x) is ‘stand’. The relevant data are as follows, using the notation of § 2. 


| 
Variety n, EA DA 
1 4 24-00 191-8 
2 4 25:25 191-0 
3 4 20-50 193:1 
4 4 28-00 219-3 
5 4 27-75 189-6 
6 4 26-50 213-6 


If we consider only the analysis of paired comparisons between adjusted means the contrast variance 


reduces to 1 1 - 
Ti — T 
8b -alles 
Ni Ny Stez x 


l C. .) 
which for this case gives BE (4110 x 97-22. 


From the preceding discussion we may assert that with probability equal to or greater than 0-95 


yi yj - 38484 <6 (Yi Vi) SVE— Vs + 9848 
for all ĩ and j. l ‘ 

Detailed calculation for the largest difference of the y; (variety 4 against variety 5) gives a value for 
3.848, of 26-8 so that variety 4 is significantly greater in yield than variety 5. A minimum value for 
3-845,, is given by assuming z, = A. This gives a value of 26-5 which indicates no further significant 
differences between pairs of j; except perhaps ji V and yi — ji. Calculation of 3:848,, for these two 
cases shows the differences are not significant. ^. 

In contrast to this analysis, Snedecor, using the sequential testing of Hartley, asserts that varieties 4 
and 6 differ from the remaining varieties (whose 43. — e e but not from each other. 
However, as earlier remarked, this conclusion shoul viewed with reserve. k 

Note. Somewhat the same problem is considered in part of a paper by Kramer ( 1957). a ecu 
between Kramer's paper and this note is that Kramer considers an extension of Duncan E m atp : roe A 
test, and thus is considering control of an error rate different from that controlled in the Scheffe p 


cedure, and in addition there appears to be no proof that for his proposals error control in the Duncan 


sense is either preserved or even that his proposals provide conservative tests. 
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An empirical investigation into the distribution of the F-ratio in samples 
from two non-normal populations 


By H. R. B. HACK 
Glasshouse Crops Research Institute, Littlehampton 


l. DESCRIPTION OF THE DATA AND PROBLEM FOR INVESTIGATION 


As a preliminary to the study of the growth of tomato plants under glasshouse conditions a root excava- 
tion was carried out at Cheshunt Research Station in 1950 (Leonard, 1952). Root lengths in 10cm. 
cubes of soil were collected in eight horizons referred to as Depths I-VIII. Each depth formed a 10 x 8 
row and column layout. The values for Depths IV and VI, which are the two non-normal populations 
used in this empirical investigation, are given in Table 1. The frequency distributions are shown in Fig. I. 


Depth IV Depth VI 
Mean = 286 Mean = 18:9 
s= 245 s= 25-4 


w 
e 

w 

e 


Number of soil cubes 
5 


Number of soil cubes 
N 
e 


= 
e 

= 

e 


100 cm. 100 cm. 
Root length per cube Root length per cube 


Fig. 1 


; Inspection of the data suggested that there were certain major trends as a function of the three dimen- 
sions of the solid excavation. Although it would appear perhaps unlikely that a root sample would have 
a value independent of the neighbouring samples, it seemed necessary, first, in terms of the row and 


error term. This might be attempted by the usual 
variance ratio tests or by a randomization procedure (Fisher, 1947). 

Referring to the assumption of normality underlying the use of statistical tests based on the t and z 
distributions, Yule & Kendall (1947, p. 437) summarized a general view: 


Somo experiments have been made to throw light on the question whether they are true for other types 
of universe, It appears that, provided the divergence of the parent from normality is not too great, the 
results. . . true for normal universes are true to a large extent for other universes. But the whole situation 
is obscuro and it is to be hoped that in time investigators will be able to engage in the labour of a closer 
inquiry. 

Table 2 shows the degree of de 
errors (Fisher, 1950, $19, C). 

Depth I has been subdivided into two blocks of fort; 
rows (1—5 only is g, less than its standard error. For t| 


parture from normality measured by g, and g, and their standard 


y cubes owing to heterogeneity. At Depth I in 
he other half, rows ( 6—10), g, is approximately 


j 


- —Ó.—— 
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Table 1. Root length (em. ) per cube of soil 
Depth IV 


Column 
we 2 3 4 5 6 7 
1 120 10 30 9 2 8 
2 165 3 1 7 
3 39 13 46 8 9 3 
4 29 8 20 0 5 6 5 2l 
5 6 5 5 4 6 qu DN PESE 
6 5 3 IL "YS Oak en T 3 28 
7 27 8 os N 6 32 
8 68 42 24 "HERR 5 6 18 
9 69 30 5 7 9 Gan see 16 
10 T 9 em 21 5 0 20 
...... Tn˙ñ 
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2-47,,. At successively greater depths g, is 5-4—15-8 times its standard error. Evidently the skewness * 
is strongly developed, in the sense that there are a few extremely high values and a large number of - 
intermediate low ones. (Values for root length per cube less than zero are, of course, impossible.) The 
value of g, also becomes progressively larger down to Depth VII. $ 

This degree of departure from normality is large enough to render the validity of variance ratio TE 
tests uncertain for use in testing for lack of homogeneity in sample populations of variances. If such 
tests could, however, be employed, some aspects of spatial distribution might be approached by an 
analysis of variance of root length in terms of the three dimensions of the volume investigated. 

In the present row and column layout the row and column and residual mean squares are estimates of 
the variance at any one depth, if no row and column effects exist, and the question is whether the ratios 
of these estimates follow the F distribution. The relevant scheme of randomization must allow the eighty 
observations at one depth to be freely distributed among the eighty cells available. Earlier studies have 
considered arrangements with more restrieted, or otherwise different, systems of randomization. The 
results from these may not be directly applicable here. 


Table 2. Departure from normality of frequency distribution of root length per cube 


Depth “ 
| 
II IH IV v VI VII VIII 
4 
1-5 1-4 15 2.5 3-6 41 Vl 
— 
32 1-8 3:0 7-0 15:9 20-9 21 | 


Standard errors: n = 40, o, = 0:37, 0% = 073; n = 80, % = 0-27, % = 0-53. 


The distribution under randomization for the z transformation of the variance ratio was studied 
empirically by Eden & Yates (1933). Their data consisted of eight sets of thirty-two observations of thes 
height of barley. All these sets showed less departure from normality than found here. The data were 
then amalgamated into eight sets of four to simulate a randomized block design, a process which would 
ps Xp to reduce skewness. They made 1000 random arrangements of the resulting four treat- 
ments" in eight blocks. They concluded that the empirically determined z distribution did not deviate 
significantly from that expected from a normal population. 

Later, Welch (1937) developed the problem theoretically in terms of a statistic U which is a mono- 
tonically increasing function of z and applied his results to uniformity trial data including both random- 
ized block and Latin-square designs. He concluded (p. 47): 


For Randomized Blocks the cases considered showed close enough agreement between the randomization 
and normal theory variances of U. In each of the three uniformity trials for Latin Square, however, the 
randomization variance of U was considerably smaller than that of the normal theory. 


It should be noticed that the randomized block restricts randomization more than does the present 
layout. Random rearrangement occurs within blocks only, equivalent, for example, in our case to rows; 
the block sum of squares, however, remaining constant. The columns would then correspond to treat- 
ments. The restriction on randomization is greater still in the case of the Latin square. , 
j: Box & Anderson (1955) have discussed the use of permutation tests to assess the effect of departures 

nage normality on standard statistical tests, with particular reference to-the comparison of means an! 
= comparison of independent variances, where the standard of comparison is a value derived from & 
theoretical population. They were able to show the importance of kurtosis as the major disturbing a 


i 
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and demonstrated the calculation and use of modified degrees of freedom for obtaining the distribution 
of the relevant statistics. 

It would appear that there is evidence to suggest that, on the one hand, since we have free randomiza- 
tion in the n xs cells there may be a smaller tendency for our F distribution to depart from normal than 
is found in Welch’s work (his theory is not directly applicable), yet, on the other hand, the high degree 
of non-normality might give rise to considerable deviation in the F distribution. 


, Table 3. Empirical distribution of F under randomization 


Depth IV 
Frequency distributions for F, Frequency distributions for F, 
Frequency Frequency 
Class Class 
Obs. Exp. 
|. Rz216 F,z2-04 3 5 
1:82-2-15 1:74-2-03 7 5 
1:33-1-81 1:31-1:73 18 15 
0-92-1-32 0-94-1-30 28 25 
1/F, 1-09-1-64 1/F, 1-07-1-53 25 25 
1:65-2-50 1-54-2-20 1 15 
2-51-3-29 2-21-2-78 4 5 
2 3-30 22-79 4 5 
* = 4-94, P = c. 7096, D. r. = 7. x? = 403, P = 70-80%, D.F. = 7. 
Depth VI 
r - 
Frequency distributions for F, 
— 7 0 o 1 
Frequency 
Class 
Obs. Exp. 
EE 
V. S 2-16 0 5 V. S 2:04 
1822.15 1 5 1-74-2-03 
1:33-1-81 18 15 1:31-1:73 
0:92-1:32 27 25 0:94-1:30 
1/F, 1-09-1-64 33 25 I/ 1:07-1:53 
1:65-2-50 19 15 1:54-2:20 
2-51-3-29 2 5 2-21-2-78 
> 3-30 0 5 2279 
Le R N EN 
X? = 19-39, P = c. 296, D.F. = T. * =1171, P = 20%, vr. =T. 


* An approximate theoretical approach is outlined by N. L. Johnson in the Note on p. 265. 
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2. FREQUENCY DISTRIBUTION OF VARIANCE RATIO IN 100 RANDOM ARRANGEMENTS 


To obtain empirical information on the distribution of the F statistic under randomization, one hundred 
random selections of the possible permutations of the eighty values for each of the two Depths IV and 
VI, having different degrees of departure from normality, were made by means of random numbers 
(Fisher & Yates, 1949). Each value may, therefore, appear on sampling in any row or column. 

For each of these random selections, calculations were made of: 

(a) F, = (m.s. rows) / (x. s. residual) for 9 and 63 D.F. 

(b) F, = (m.s. columns) / (n. s. residual) for 7 and 63 D.F. 
These are shown in Table 3 for Depths IV and VI. The frequencies in the tail ends of the distribution 
curve are especially important. The class intervals in the tail ends were determined by the restriction 
that no expected frequencies should be fewer than five, as recommended for the y? test, owing to the 
continuity of the distribution of this quantity. The observed frequency of finding a value of F (or of 
1/F for the lower % points) as great or greater than that shown for the 5, 10, 25 and 50 % levels for the 
appropriate degrees of freedom (e.g. as given in the Merrington and Thompson Tables (1943)) is com- 
pared with that expected from the level of probability. 


(a) Depth IV 


Evidently the values of F observed do not differ significantly in frequency from values which would 
be expected from random arrangement of a normal population. In other words, the mean square for 


rows and the mean square for columns are as good estimates of the true variance as one could expect 
from a normal population. 


(b) Depth VI 
The distribution of F for the population found in Depth VI shows important differences from that 
found in Depth IV. 
With 9 p. r. available for the between rows M.S., the observed value of y? has P = 0-20, although there 
is a tendency towards excess of intermediate low values and deficiency in the tails. With 7 D. F. available 
for the column m.s. the frequencies of F are significantly different from those expected from a normal 


population. There is a deficieney of large and small values of F, and also an excess of intermediate low 
values as appeared in the ease of rows. 


3. CONCLUSIONS 


These results will apply only to the present randomization system, which is characterized by free 
distribution among the cells of the two-way table, with the degrees of freedom available. In order to 
generalize further it should be borne in mind that we are assuming that not only the random number 
tables used here, but also that other such tables will always sample in the same manner (Kendall, 1941). 


Table 4. Empirical % points and standard deviation of F 


F, F, 
Lower Upper Lower Upper 
8p Sp 
5% | 10% | 10% | 5% 5% | 10% | 10% | 5% 
Depth IV | 0:27 | 0:36 158 | 2-12 | 0-54 | 0:36 | 0-52 1-78 2:03 | 0:50 


Depth VI | 0-44 | 049 | 1-48 | 1-56 | 0-37 | 0-45 0.51 | 1:52 | 196 | 043 


— ! ̃ ̃ / dor |. cl lel 


Some theoretical values of these quantities are given by N. L. Johnson in a table on p. 266. 
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With non-normality of the order found in Depth IV it seems that standard tables for F may be used 
for tests of significance in the analysis of variance. 

In such an extreme case as Depth VI, the standard deviation of F is much reduced (see Table 4, where 
the 5 and 10 % points are also given) especially in the case of F, which has slightly fewer degrees of 
freedom than F,. The standard tables would underestimate the frequency of occurrence of cases showing 
significance at the 5 % level. A complete mathematical analysis would evidently be desirable to cover 
cases where the non-normality is as great or greater than has been observed here over a range of designs 
with varying degrees of restriction on randomization. 


I should like to thank Prof. E. S. Pearson for his stimulating suggestions on the implications of the 
empirical randomization. 
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Theoretical considerations regarding H.R.B. Hack's system of randomization 
for cross-classifications 


Note by N. L. JOHNSON 
University College London 


The empirical distribution of the F. ratio investigated by H. R. B. Hack is similar to the randomization 
distribution studied by Welch (Biometrika, 29, 21-52 (1937)). In the latter case the system of fando: 
ization was restrieted by the condition that observed values should remain in their original rows: 
Hack, on the other hand, allows all rearrangements of the original data among 
table to be possible. 4 
Welch gives formulae for the randomization mean and variance of 
between column sum of squares 


Us between column sum of squares residual sum of squares 


the cells of the n x s 


2(1— A) 
These are 6(U) = ln; var (Y) = n*(s— 1) 
Where 4A-X0 iQ ia 


and u; = (original observation in ith row and jth column) — (mean of all observations in ith row). 


The expected value of U in Haek's case is also 1/n; in the formula ien ( a 1 quisi LL m 
by A, the average value of A over all possible assignments of observed values d Ls ee 
tion of 4 appears to be difficult, but if we neglect the row means (i.e. assume 


the grand mean are always zero) we find that 


4 
ui 
2 «n-D[. 2 
and var (U) "Rl va qe 
1 
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The randomization distribution of U might be approximated by a type I distribution having or 
terminals and first two moments. If this is so, the corresponding distribution of the ratio 


(between columns mean square)/(residual mean square) 

would be an F distribution with degrees of freedom 
V = (n—1l)v; „ = (#—1)(ns—1)/(ns—g)—2/n, 
EE 
= "= Seb 
tj 
Similar expressions would apply to the approximate distribution of the ratio 
(between rows mean square)/(residual mean square). 

Using the data provided by Hack we obtain the ‘equivalent’ degrees of freedom shown below _ 


Columns (for F,) Rows (for F,) 


Columns (for F,) Rows (for F,) 


Lower Upper Lower Upper 


596 


10% | 10% | 5% 5% | 10% | 10% | 5% 


Normal theory | 0-30 
Depth IV 31 
Depth VI 36 


0-40 | 1-82 | 2-16 | 0-59 | 0:36 | 0-45 1:74 | 2:04 
41 1-80 | 2:15 58 37 46 1:573 | 2:02 
45 | 1-68 | 2-01 52 E 50 1-68 1-91 


The values of the standard deviation of F were obtained from the expression 


pees e 
ieri 


2(v, 4-y,—2) 1 
Vy(Ve—4) J 7 


and IV, though the actual results for columns at Depth VI are considerably less variable than 
predieted by the approximate theory outlined above. 
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On the equivalence of two tests of equality of rate of occurrence in 
two series of 
events occurring randomly in time 


Bv D. E. BARTON 
University College London 


We suppose two time periods of length ti and t, in which ny n, events are observed. assumption 
that each is the realization of a Poisson 2 ie. — hie puede A «d = 

Ay 
pin) = SEU gan (21,2), (n 
it is desired to test the hypothesis: A, = A,. : 

E srowski & Wilenski (1939) noted that, conditionally on n, +", = n, m is binomially distributed 
with parameters n and p = t,/(f +t). Considering the symmetrical t proposed . 
tion of the null hypothesis, vith a first kind of error a, if oe eae 

(a) n S kin, a) or (b) m hn a), 
where Pn: € kiln, a) |n, A 2 44) < d (2) 
and where, in this case, kan, &) = k(n, æ). Their method extends directly to the case f, +t, the only 
difference being that p + $, in general, so that the rejection values k, and E, are different from one another. 
Cox (1953) proposed what was apparently a very different test. It is the purpose of this note to show 
that the two tests are substantially equivalent. 


Table 1. Table of values taken by u, v, w for different values of n,, when n= 10, 20 and p=} 

k ; 

n. 0 1 2 3 4 5 | 6 7 
e ieS 

n=10 u | 0-000 | 0-006 | 0-033 | 0-113 | 0-274 0-500 | 0-726 | 0-887 | 0-967 0-994 


v -001 | -011| -055 | -172 | 377 623 828 
004 -026 | -102 | -265 | 500 735 897 974 996 


n=20 | 0-000 | 0-000 | 0-000 | 0-001 | 0-004 0.013 | 0-039 | 0-095 | 0-192 | 0-332 0-500 

-000 | -000 | -000 | -001 | -006 021 -058 | 132 262 412 -588 

000 -000| -000 | -001 | 00 012 »036 089 186 | 328 500 
| | 


the argument for the lower tail being the same. Then, 
using the incomplete /-function representation of a sum of binomial probabilities, the rule (b) may 


be expressed, Inn n L I) <ia (3) 


which amounts to transforming n, by the discrete analogue of the probability integral transformation 
using the v-form of David & Johnson (1950), i.e. where the transform v = v(m) is the cumulative pro- 
bability distribution function of ni. Let us modify this slightly to 

I, (ry bn YS l, (4) 
so that the first kind of error is not made so strongly less than æ by the discontinuity of the distribution. 
The form of ‘continuity correction’ is in practice hardly to be ed from David & Johnson's 
u-form of discrete probability integral transformation, where u = ee rsen SoY 99 55 
denoting t 4 i by 1—w, and takin: for example the values (20, y), (10, 3) for (n, p) 
ing the 1i hero 5 ual set of values as n, goes from 0 to n. Since 


w and w are seen in Table 1 to run through a very nearly eq à sn, BO" 
the departure from rectangularity manifests itself in à decreased variance it is of interest to note that 


Let us consider the upper tail for simplicity, 


057. »,) degrees of freedom, pani 1) ) &ja. (5) 


Ponti. 27 PETE 
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Now if F. (e; v, vn] is the lower 100€ % point of an F having v, v, degrees of freedom, we may write (5) as 


p ny +1 
T— F. (ng: 2 1,2 1 
I- NI (Ja; 2n, ＋ J. 2n, 1) (6) 
which is the lower tail rejection rule of Cox. 
It will be noted that without the continuity correction (4) Przyborowski & Wilenski’s test consists 
of using the rejection regions described in Cox’s language by 


t, 2ng+2 
t, 2n, 
11 212 2 „ ^ 
(b) & o «0e 2n, 2n, ＋ 2). 

This test has an error of the first kind certainly less than a, but generally more so than is desirable whereas 
Cox’s modification has an error which will be greater or less than &, randomly, according to n (and also 
varying with p). This error will be approximately a when averaged over n (as it must be in the experi- 
mental circumstances envisaged where n will have a truncated Poisson distribution of some sort). This 
follows from the relation of (2) to David & Johnson’s u-form of discrete probability integral trans- 
formation. The numerical results given by Cox in his Table 1 serve to verify David & Johnson’s general 
conclusions in yet another instance, but it will be interesting to see the complete results of the systematic 
investigation of the problem on the lines they lay down, now being carried out by D. H. Young who 
has kindly provided the figures of Table 1. 


(a) 


>F (da; 2n,, 2n, +2), 
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The mathematical relation between Greenberg’s index of linguistic 
diversity and Yule’s characteristic 


By G. HERDAN 
University of Bristol 


At the first glance, 


the two measures mentioned in the title are different both in form and meaning: 
Greenberg's cons 


tant is meant to measure the diversity of language in a speech community, and is 
mathematically the complement to unity of the sum of the squared probabilities of these languages. 
Yule’s constant characterizes the vocabulary occurrence (type-token) relation in a given text, and is 
mathematically the ratio of the second to the squared first moment of the word count distribution. 


GREENBERG'S CONSTANT 


The examination of any map of linguistic areas will show regions of greater diversity and others of 
relative uniformity, while still others may seem intermediate between these extremes. Greenberg's 
aim is to have quantitative objective measures of diversity by which to replace such subjective impres- 
sions. The simplest model, called the monolingual non-weighted method A, may be described as follows 
(Greenberg, 1956). 


If from a given area we choose two members of the population at random, the probability that these 


two individuals speak the same language can be considered a measure of its linguistic uniformity. If 
everyone speaks the same language, 


€ ; the probability that two such individuals speak the same language 
is obviously 1, or certainty. If each individual speaks a different language, the probability is zero. 
Since we are measuring diversity rather than uniformity, this measure must be subtracted from 1, 80 
that our index will vary from 0, indicating the least diversity, to 1, indicating the greatest. 

_If the area comprises speakers of r languages and the proportion of speakers of language i is Pi 
(? = 1,2, ...,7), then the total probability of choosing two speakers of the same language is the sum of 
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the probabilities of such an event for each individual language; i.e. X pt. Subtracting this sum from 1 
as suggested above, the formula for the coefficient of linguistic diversity becomes 


r 
A=1-3¥ p} a) 
i=l 


This may be illustrated by a hypothetical example. Ifin a population three languages are used and 
the proportion of speakers is as 1:3: 4, then 


A-1-(D'rG-Q*-1-it-ib or 0-594. 


YurE's CHARACTERISTIC (Yule, 1944) 
On the other hand, Yule’s Characteristic K is 
K = 10*(S,/81 — 1/8,), (3) 


where S, and S, are the first and second moments, respectively, of the word count distribution, that is 
the distribution of vocabulary items according to frequency of occurrence: S, = Xf,X; S, = Xf, X*, 
where f, is the number of words occurring X times. For large samples, and neglecting the factor 10* 
which Yule used only so as to avoid very small values of K, the Characteristic becomes 


K* = 8,/Si, (3) 
which is a characteristic constant of the word count distribution epitomizing the relation between 
vocabulary and word occurrence, that is frequency of use, in a given literary text (or author). It may be 
regarded as measuring the extent to which word occurrences are concentrated upon particular vocabu- 
lary items, and it shows that a particular style is characterized by a constant relation between uniformity 
and diversity in the number of repetitions of the items of vocabulary (Herdan, 1956). 


THE REPEAT RATE 


Yule's Characteristic is thus a measure of stylistic diversity, and is mathematically expressed as the ratio 
of the second to the squared first moment of the word count, whereas Greenberg's constant is a measure 
of linguistic diversity among the members of a speech community and mathematically the sum of the 
squared probabilities of the various languages, which makes them look rather different from one another. 

However, the close relation between the two becomes apparent if we use the interpretation given in 
a paper by Good (1953) of the Characteristic K as the repeat rate of words, i.e. the probability that two 
words chosen at random from a text will be the same dictionary word. That paper suggests methods of 
estimating, among other things, various general population parameters measuring heterogeneity. One 
such parameter for which Good uses the symbol 6 , is an estimate of the probability that two words 
selected at random from the text under consideration will turn out to be the same word of the language. 
Hence, it tends to be larger the more repetitive is the author's vocabulary, or more roughly, the smaller 


is the author's vocabulary. 
If the population probabilities of the distinct words are Pı, Per. then, as Good has shown, 


4 4) 
9 
20 8(5-1 
+... as S100. Since Yule’s Characteristic K is equal to 
ik 1 Sa— Sı S,-1 
E= 108, (1-3) = 10% (FD. S7 


represents an unbiased estimate of ptr 
10*6, (1— 1/85): 


EA S (5) 


it follows that it tends to 104( p3 4- pj .) as S1 > 0. ^s Characteristi hi 
Thus, Greenberg’s measure of linguistic diversity A and Lud ^ tical eee o poth the 
nature of a repeat rate, and therefore of essentially the same mathema 


ILLUSTRATION 

i t to fp, i.e. groups 
the denominator of formula (3) refers no *. 
ES to the individual words, the formula becomes 


K* = XXi[EXQ* 


If the summation in the numerator 
9f words of equal occurrence frequency, 
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where X is the frequency of occurrence of a word, and different values of X imply different probabilities 
of occurrence. In Greenberg's formula for A, the corresponding probabilities of occurrence are the values 
of p, for the different languages. Using the values of p, from our hypothetical example and assuming 
a sample of the population of, say, 1000 inhabitants, the values of X result as 


1x1000— 125 
ix1000— 375 
1x1000— 500 

1000 


and the characteristic is calculated as 
K = (125? + 375? + 400*)/1000? = 406-255/1,000,000 = 0-406, 
and A=1-K = 0:594, 
which is the value of A calculated by formula (1), in accordance with the conclusion, reached on 


theoretical grounds, of the essential similarity in mathematical structure of Greenberg's coefficient 
of linguistic diversity A and Yule’s stylo-statistical Characteristic K. 
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Note on a discontinuous probability density 


By J. E. KERRICH 
University of the Witwatersrand, Johannesburg 


Consider the probability density p(x) defined as follows: 


P(x) = azga-*-t (OS S g; a>0) 
= bed Sb - -b (S S S0; b» 0), (1) 


and is zero for all other values of x. 
At x = E, p(x) has a saltus of amount 


(b—a) aoe", a 
Then, if P(x) = * p(x) dx, 
* 
P(x) = ae (vo S g), (3) 
= apie? (g S S co), 


and is a continuous function in the range zy S < co. 
Writing z = logio, C = log; E and z, = logy h we have 


log; P(x) = a(zọ— 2) nl (4) 
= a(%—€)+(E—z) (CS co). 4 


The above distribution function was suggested by the data given in Table 1 and Fig. 1. Attempts to 
fit this function by maximum likelihood methods were unsuccessful, and a modification of a meth 
used in an earlier paper (Kerrich, 1949) was used instead. The technique described is essentially a large- 
sample technique. 

The range % <a « oo is divided into k+1 intervals , S <z; (i = 0 to k), and out of n observations 
of a, f, fall within the interval z, &  «7,,,,2,,, = 00. It is assumed that the observations form a random 
sample in the sense that for given n the f; have a multinomial distribution. (6) 
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Then, if y, = à (i 2 1,..., k), it follows that for large n the y, are approximately normally corre- 
lated, with mean values and covariances 


Ely) = P(x) = P, say, | e 


cov ( = PY - FEν (Pj P), 
Next, if w; = log;o(y;) it follows that for large n the u, are approximately normally correlated with 


E(w,) = logy, P(z,), 
which has the values given in (4) and 
cov (ww) = HH (7) 
where 4 0:4343... and Pjz P. 


By graphical methods initial estimates are obtained for a, b and g (see Fig. 1). Call these estimates 
às, b, and Čo, respectively, and let Aa = a—a,, Ab = b—b, and AC = £— C, 
Make the minor transformation 


v, = w,—ag(29 — 20 when z «6 and i-1,2,..,m | (8) 


= W; —ao(žo— Čo)—bolčo—z;) when z>% and i=m+l,...,k. 
Then assuming that terms of higher order are negligible, 


B. = E(v;) = Aa(24—2;) when EH, (9) 
= Aa(z, - čo) + Ablo — z) + M(b,—a,) when z»t 
and cov (vv) = cov (w,w;) (10) 


as in (7). Thus the v; are approximately multinormally distributed with joint probability density 


mes expl - 4(V-BY0(V-B)}, an) 
where V is the column vector (vj), 
B is the column vector {f} 
and C is the inverse of the coefficient matrix 
EIn. (12) 
The elements of € are Cy = np PiP- P+ 5 (13) 
Chaar = -=. A- Four) = Cj; 


and the remaining Cy are zero. In matrix notation, (9) becomes 
B= KT. say, (14) 


i P = = i three quantities to be 
where T is the column vector of y, = Aa, Y = Ab and 75 AG, which "Are the à 5 
ted and & is the k x 3 mate of the coefficients appearing on the right-hand side of equations (9). 
(In numerical applications m is chosen so that Zm < f and Zm: bo.) 
Using (14), (11) becomes 
g (14), (11) becom exp[-KV-KT)'O(V—KT)], (15) 


and if, is the least-square estimate of y; (i = 1 to 3), then they are the solutions of the normal equations 


A 
KO = K (16) 
in which f = {93} is the column vector of the J.. Next, (V — KT)'C(V — KT) is to be split up into 
(v — ky otv H 0 - ry&'ekd -1), (17) 


and it can then be shown that the 7, are normally correlated with means y; and covariance matrix 


(KON, ( 18) 
while (y - kfyoty - Kf) = x- (19) 
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has a A distribution with £—3 p.r. In practical applications C is unknown and has to be estimated by 
replacing the P, by the approximation obtained when ay, b, and C, are used. 

From (18), confidence intervals are obtained for the values of a, b and g. and if required, confidence 
belts for the position of the straight lines in (4). 

(19) provides a goodness-of-fit test and checks whether the model laid down in (1) is a plausible one 
for the data in hand. 

In the practical application considered here the data in Table 1 refer to the particle size distribution 
of airborne dust in a gold mine, 


Table 1 
Percentage of 
No. of | particles of 
particles size g 
T 100y | 
1 0-09 
2 0-36 
5 1-00 
14 2.72 
50 8-51 
130 24-80 
160 51-04 
381 100-00 


"2 


0 * 0˙5 10 


Fig. 1. The data are used with permission of the Transvaal and O.F.S. Chamber of Mines. 


Dust is precipitated on to a glass slide. A small area on the slide is examined under a microscope. The 
field of view contains a set of small circles of known diameter. The observer matches each particle ob- 
served with one of these circles. Thus, of 381 particles stated to be of size 0-5 it is assumed that half are 
greater and half are less than that size, and so on for the other sizes given. With the microscope used, 
there is a lower limit % (corresponding to 0-5 arbitrary units) below which practically nothing is known 
about the particle size distribution. No attempt is made here to make any judgement about what 
happens in that region. 

In Fig. 1, v = log, y is plotted against z = logio . The fact that the points lie close to two straight 
lines suggested the model discussed in this note. 


Miscellanea 273 


Parameter 95% confidence limite 


(V — KTy'Q(V — KT) = 2860 = xi. 


Since the corresponding P. for 4D.¥. is approximately 0-60, the ‘fit’ is satisfactory. Even if a physicist 
might not care to accept the model discussed here as the ‘true’ law of distribution, it appears to be a 
reasonable approximation in this particular case and in several other similar cases. The emphaais in 
this note is on methods and not on applications. 
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A remark on Spearman's rank correlation coefficient 


By HARALD BERGSTROM 
Chalmers University of Technology, Goteborg 


The well-known Spearman's correlation coefficient p(P) for a permutation 


a z) 


is given by p(P) = 1-525, 
where dP) = P (i- ky. 


The distribution function of p(P) has been studied in the case when all permutations are possible and 
have the same probability. Mud the possible permutations belong to a subgroup (in the ur 
sense) g of the symmetric group 7, of all permutations I have found the following 1 E 

Tf g is a transitive subgroup of Yas i-e. a subgroup such that the figure 1 can be transformed i figures 
1,2,...,n by the permutations of g then the mean 


1 
BaP) = 3j P d 


(ord (g) denoting the order of g) is the same for all such subgroups, 
E,[d(P)) Ed). a} 
nM ae i „. V = 12,...,n, 
If furthermore g is double transitive, i:e. if (1, 2) can be transformed into every system (» 
1X. 2. L,. g r l. i) te permutation of g then dP) has also the same variance for all euch g. 
Tn faet, I show that Ed = E, ld P) (2) 


for all gCy„ which are double transitive. Then, of course, we also have 
var, Id P)] = E,[d(P)] - Eaſd P = vary, [d(P)]. (3) 


As a consequence of (1) and (3) we get E. oN = 9. 9 
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defined by Fru Hla P, (re), 
640, R) = 0, 

Erie =P, ren 
O(n, R) =l 


The desired confidence coefficient = 1— P, — P,. 
RC ia the sama as R except that the estimation intervals are open. Stevens has demonstrated 
Prob[8,r, N C g, R))>1-P,-P,, 0 
Prob le Hr. R) «0* <0,(r, Ne-. -P. ( 
His proof is enough to be valid whichever of G or (0,,0,) is considered to be random. He 
statement concerning (3-55) which is a function of 0*. 
However, by a more roundabout method stronger resulta than (3-52, b) can be demonstrated 
if «0,0, R), then: N Ne-, 
if r. R), then: P(0|R)»1—P, 
foralló P(0| R)- 1—P,—P,. 


considering 
the difference between closed and open intervals is negligible when the confidence coefficient is 
by (2-6) provided that the open intervals are closed at 0 and 1. 
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freedom is 16, which may be one or two units in error in the second decimal place. For inte 
values of f, it is necessary to interpolate with respect to 12/,/f. 
For an example of the use of the tables, reference may be made to the earlier paper. 
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Studies in the history of probability and statistics. 
VIII. De Morgan and the statistical study of literary style 


Bv R. D. LORD 
Royal College of Science and Technology, Glasgow 


C. B. Williams has recently given an account of two little-known papers published by T. C. Mend 
in 1887 and 1901 on the statistical analysis of literary style, but was unable to trace Men 
reference to de Morgan's suggestion that one might identify an author by the average length 


at fault, that the book was in fact Memoir of Augustus de Morgan by his wife Sophia, published in t 
and that the suggestion occurs in a letter of 1851 to an old Cambridge friend, the Rev. W. Heald. J 
rest of the book has no hint that de Morgan ever followed up his idea. Had he done so he would prob 
have found, as did Mendenhall, Yule and Williams, that word-length is an unsatisfactory crite! 
compared with sentence-length. The letter can be allowed to speak for itself. 


Aug. 18, I 
Dear Heald, Sr 


It has become quite the regular thing for the depth of vacation to remind me—not of you, for nythi 
that carries my thoughts back to Cambridge does that,—but of inquiring how you are getting on, of wi 
please write speedy word, according to custom, once a year. ... 

* ` 


* * 


If scho w of averages as well as mathematicians, it would be easy to raise a few hun 
pounds to try this experiment on a grand scale, I would have Greek, Latin, and English tried, and I 
expect to find that one man writing on two different subjects agrees more nearly with himself than two d 
men writing on the same subject. Some of these days spurious writings will be detected by this test. 
T told you so, With kind regards to all your family, I remain, dear Heald, 

È Yours sincerely, 


A. De Morga 
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REVIEWS 


Dictionary of Statistical Terms. By M. G. Kxxpau and W. R. Bocxtaxp. Edis- 
burgh: Oliver and Boyd Ltd., for the International Statistica] Institute with the 
assistance of UNESCO. 1957. Pp. ix 493. 25s. 


This book constitutes a worthy and useful contribution to the consolidation of statistical terminology, 
It became necessary as a result of the vast recent development of statistical theories and their applica: 
tion. Great difficulties are involved in such a task since many authors create their own. terminology 
and insist on its use, The book contains about 1600 terma with clear definitions, Numerous crose 
references increase the importance of the information, 

Instead of attempting to construct a best terminology, the authors reproduce mainly the existing 
one. This is bound to lead to certain difficulties, The probability Fiz) of a value up to z is called the 
distribution function while probability distribution, frequency distribution, frequency function, 
stand for the derivative, i.e. the density function. Certain terms are rightly characterised as obsolete 
and others as equivalent to better ones. 

The formulae included are very helpful for the understanding. In most cases, the original author 
and the year are stated. The lack of further bibliographie reference is regrettable for contemporary 
authors and still more for the history of statistics, A statement (35) such as ‘Carli (1764)" is not very 
helpful without any indication where such an article can be found. 

The book covers mainly mathematical statistics, but, in addition, it contains some terms used in 
quality control, economie, technical, and physical statistics. This part could be increased. The restraint 
in including population statistics is reasonable since the United Nations are bringing out a Multilingual 
Demographie Dictionary. 

The sosond part gives the statistical terms used in the French, German, Italian and Spanish lan- 
guages in alphabetical arrangement with English translations and references to the pages where the 
English definitions are given. The necessity of including Spanish may be doubted. Since many more 
contributions have been made by Russian than by Spanish authors, a future edition should also contain 
a Russian dictionary in Cyrillic and Latin letters. The inverse problem, namely, German, French, 
and Italian translations of the English terms, is partially solved. An English-German dictionary exista, 
and a French translation is on the way. 

The authors should be congratulated for their patience and achievements. They have made a great 
contribution to international scientific co-operation. E. J. GUNDEL 


and M. Roses- 
Statistical Analysis of Stationary Time Series. By U. GRENANDER 
BLATT. New York: John Wiley and Sons Inc., London: Chapman and Hall, Ltd. 


The authors of this work have done statisticians — service in pir nen fr — 
diversity of applications of time-series analysis in phy t stationary process i 
eering. Their approach is concerned almost entirely res 3 espe curie 
significant of the shift in emphasis which has 8 Avge y 
the auto-correlation function to direct estimation spectrum. ‘ " i 
However, tho present writer is unable to share the authors’ belief thet in ation oo the 
to specialists, this book will also serve the needs of research o e gnags of ensure ibsory and the 
fact that the development is very formal, drawing heavily "hardly necessary (although fashionable!) 
methods of the theory of functions. Whilst the former is cong Kit surer, the applied worker 
the latter is extremely important and is used with elegance in thi order to glean the essential features. 
will find it very difficult to assimilate page after page of analysis in is one of the best in the 
Chapter I. dealing with the fundamental properties of sum Py D Chapter 2 ig the most difficult 
book and contains a series of very interesting eredi i iva: Chapter 3 con- 
to read and is concerned with least squares problems the auto-correlation approach to time-series 
stitutes a rather unsuccessful attempt at e. tics rules the da a very elaborate proof 
using finite parameter models. Here MER. tie rut even a mention of the more practical 
of the derivation of Fisher's 9 distribution is given 
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Chapters 4 and 6 contain the authors’ fundamental contributions to the asymptotic distribution 
theory of spectral estimates and the evaluation of their biases and variances. It is gratifying to observe 
that the uncertainty principle is not mentioned, but is apparently replaced by a mean-square error 
criterion for comparing various estimators, the latter being the sum of the variance and the bias rather 
than the product as considered in the uncertainty principle. Chapter 5 is concerned essentially with 
the applied mathematics of noise, ocean-wave and turbulence spectra. Some attention is given to 
analogue methods for estimating the spectral density, but it is surprising that no mention is made of 
a method which corresponds exactly to the digital approach. This is to divide the series into à number 
of sections and then average harmonic analyses conducted on each subsection. The writer knows of 
at least one instrument based on this principle. 

Chapter 7 deals with regression problems when the residuals are stationary time-series. The inter- 
esting result is proved that in certain regression problems (including the important cases of trigo- 


One does not expect an answer to all the problems, especially as the authors have given many of 
the answers already, but it is reasonable to ask that the difficulties should be brought to the fore and 
not concealed in a maze of mathematics. There is no doubt, however, that this work is a welcome 
addition to the limited number of books on time-series and it is to be hoped that it will be widely read. 


G. M, JENKINS 


Psychological Tests and Personnel Decisions. By L. T. Cronsacu and G. C. GLESER. 
Illinois: University of Illinois Press, 1957. Pp. 165. $3.50. 


The traditional theory of mental testing, as Dr Cronbach and Dr Gleser point out, has for the most 
part been based on the principle (stated most. clearly perhaps by Clark Hull) that ‘the ultimate purpose 
of using aptitude tests is to estimate or forecast aptitudes from test scores’. From Kelly to Gulliksen 
the prime criterion for judging a psychological test has been precision of measurement. Correlation 


mine how far ‘uncertainty’ is reduced—uncertainty being assessed by the mean square error of the 
quantity to be measured, 

Dr Cronbach and Dr Gleser “propose to abandon’ this traditional point of view: for them ‘the 
ultimate purpose of personnel testing is to arrive at qualitative decisions’. And they insist that con- 


vestigation’ that deals with such problems. Their general method is based on the principles of ‘decision 
theory’ as developed by Abraham Wald; and their object is to show how the principles that he has 
elaborated for use in the economic and industrial field can be applied to psychological and educational 


to the heavier pressure of costs, British firms, education authorities, and government depart- 


Thus, when such methods were first proposed in this country for the i ü e 
1 , purpose of selecting pup 
special and secondary schools’, it was necessary for the educational psychologist to demonstrate to 
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those who employed hun that the benefits to be anticipated from the new snumtitie proweiures woukl 
more than counterbalance their cost, In some of the earliest reports on the problem, we was made of 
actuanal principles in the argumenta advanced, The calculations, though rough sad ready, were in 
fact a forerunner of the more precise and detailed procedures advocated in Psychological Tuas amd 
Personnel Decisions, 

Let |, denote the cost of conducting the annual scholarship examination in accordance with the oki 
procedure (two printed papers, set and marked by a salaried board of exsminem).* If the validity of 
the examination were zero, I would measure in monetary equivalents the ‘los’ incurred by ite adop- 
tion. If the validity were perfect, we may take g, to denote the resulting benefit or ‘gain’. Let p, be 
the proportion of this gain which, we anticipate, will result from such imperfect validity as the pro- 
cedure is found to possess: it may be conveniently ealeulated as an index-flgure, ranging from © te l. 
and expressing the probability that the procedure will attain complete success in every caso. Then the 
‘mathematical expectation’ will be 

z, (say) = p 7 0-94) 


2 Pig, 71) lu. 
With a similar notation for the gain, loss, and weighting to be expected from the new poliey proposed 
(e.g. adding a printed group test of intelligence for all candidates and an interview for borderline cases), 
we may write (for this second alternative) 
z, = (say) pia le ~lr 

With varying borderlines and varying groups of candidates, we shall achieve different degrees of 
success with each of the two procedures. Briefly the deciding principle proposed was to examine the 
minimum of the two mathematical expectations deducible for all possible cases, and to recommend 
that particular procedure which would maximize the minimum mathematical 

Much the same type of argument was later adopted at the National Institute of Paychology in 
suggesting methods of vocational selection to industrial firms, and in recommending methods for 
allocating army recruits to various training courses during the war. In each case the deciding factor 
was not the absolute ‘validity’ of the procedure proposed, but the additional information to be expected 
as weighted against the additional cost. 

At first sight this approach might seem almost the opposite of that adopted by Wald. Wald has 
pointed out that such statistical decisions may be compared to ‘zero-sum two-person que" 
the decision-maker is, as it were, playing agninst an obstinate and somewhat blind player whom ! 
designates ‘Nature’; and he suggests that the safest policy will be, not jo seriem eR NS, 
but to minimize the maximum loss. As Cronbach and Gleser observe, this procedure is — 
assuming that ‘if anything ean go wrong, it will’: ‘so pessimistic a view’, they — » istic" 
appropriate in statistical decisions, since Nature is presumably indifferent rather 2 th 
A conservative principle may be the most prudent in solving the problems — w ont ue b z 
or manufacturer; but for the practical psychologist a more sanguine approach 
permissible. ed. a 1 À 

Complex problems in ‘selection’ confront the in many different prose ah ey 
the old controversy of selection for secondary school has cropped up - a = ^ 
nexion with the ‘11 plus examination’. However, if one may judge ale itish Psy € 
Society’s recent Report (P. E. Vernon etal., Selection for Secondary Sa hools, , Loo eno ao — of 
nor the educationists have as yet attempted to examine practical ray hun 2 — the 


ors themselves quote Prof, Vernon’s estimate of the validity of the best 


the present time. The auth 0 between prediction gramm 
tret bait namely, 0-85; and write: "the cono "E og th glance, From 1 of 
school success in England is far less significant than it seems t scheme does this only if the slope 
national policy the aim is to maximize total output. The presen! 
o,1y is greater for the 
mining this are not available, since it has been thought that regarded 
The mathematical procedures which Dr ecu eee duce ur d bore. 
amplification of the classical method of assessing laborious'. But, às they observe, this is the price 
They admit that their ‘mathematics is involved pH ce i tion: where this price is too 
paid for bringing in the various required for e rig plifying — — 
high, the tester can ‘obtain approximate answers by using en The Use of Intelligence Tests 
* Memorandum by the Psychologist to the London Ponty secondary Schools (1917) 
in Junior County Scholarship Examinations for Free 888 : 
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With the formulation they put forward, the writers have no difficulty in showing that, even. 
the simplest conditions are postulated, there can still be no simple answer to the query “How : 


Having discussed in their opening chapter the general characteristics of “decision problems" and 
types of ‘personnel decisions’ that most commonly arise in psychological testing, Dr Cronbach 
Dr Gleser carry out a detailed examination of such problems as the optimum selection ratio, 
optimum length of a single test, the optimum size of a battery of testa, and the ‘distribution of offo 
in sequential testing. A special chapter is devoted to the neglected question of the ‘band. 
fidelity dilemma ', i.e. the difficulties raised by the fact that when the examiner or interviewer at 


evaluation of outcomes, the book concludes with a chapter on the ‘assumptions implicit in the t 
of psychological testing’, illuminated by the fresh light thrown on the problems of testing by them 
techniques of decision theory. Tho test theory developed to date’, they contend, ‘covers only a am 
corner of the domain within which the decision-maker operates... Wherever we turn, we find u 
answered questions, and research of many types is needed, from simple fact-finding to major i 
ventions,’ 


CHARLOTTE Wl. 


Regression Analysis of Production Costs and Factory Operation. By P. Ly 
(Third edition, revised by L. H. C. TIPPETT.) Edinburgh: Oliver and Boyd 
1957. Pp. xiii +204. 16s. 


and are drawn from the author's 
practical experience of the sugar-refining industry. This, the third edition, has been revised by 


term from the long-term changes and brings into account price and wage levels and other econon C 
factors. There is also a good, though short, chapter which discusses the marginal costs of produc 


sensible discussion of how to set about finding the math 
necting two or three variables when only a series of 
available. Another section gives a clear account of £h 
Mr Lyle subsequently took the subject further in two 
appendix consists of a discussion of the meaning of th 


The book is rounded off with a short bibliography, a Summary of the equations used, a glossary an- 
an index. The standard of production is high. p : 
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Experimental Designs (second edition). By W. G. Cocmmax asd G. M. Cox. New 
York: John Wiley and Sons, Ine. London: Chapman and Hall Lad, 1957. Pp. 611. 
824 
The first exlition of this book was reviewed in Vol M of ihis journal (1051, pp. 260-1). To thet mim 
it was remarked that the book sboukd prove especially valushle to statesticiane mmu © cm. 
pendium of all the more useful designe. In the intervening petted there has heen eonskberable rome 
activity in the field of experimental design. The sathors are, in commqueson, no bager able to give 
plans ofall the more useful designa. As a compromise they have inebuded an exhaustive iln to those 
designs for which a plan is not provided. Two additional chapters have been inserted. Ono, on fee 
tional replication, correcta a notable, and surprising, deficiency in the first edition. The other contains 
material on the study of reponse surfaces, based for the most part on the work of G. E V. Box. Thie 
chapter is rather out of keeping with the rest of the book, as is perhaps to be expected of à lato addition, 
Nevertheless, many statisticians will probably find thie chapter, and particularly the attached plans, 
of considerable usefulness, 

It evidence of the soundness of the first edition that very little of the original text bas been 
altered or removed in the present. edition, A number of additional paragraphs have been inserted, 
usually on pointa of detail omitted in the first edition. In particular, the inelusion of a section desenibung 
Yates’ well-known tabular method of analysing a 2* table may be mentioned. It is unfortunate 
that no space has been found for some material on the economie choice of amount of experimentation, 
a topic which has attracted some attention in recent years. 

The overall effect of the changes has been to consolidate the position of an already well-established 
text-book. The second edition is about one-third larger than the first, but the price has gone up by 
more than three-quarters, Even so, the book is still excellent value for the practising statistician. 

N. L JONNNON 


Statistické metody zemédélského a lesnického vfzkumnictivi. [Statistical methods 
of agricultural and forestry research] By Váctav Mysurvme, Praha: Ceskoslo- 
venská Akademie Zemédélskych Véd. 1957. Pp. 555. Kis 67. 


to the pattern familiar in text-books in English, with an account of standard distributions tenta, 
analysis of variance and experimental design, and an outline of the mathematical theory. — 
of statistical tables are included. 1f a general impression can be trusted, the book very good; 
typography and printing are of a standard seldom attained for text-books in any country. 

p. J. FINNEY 


second 
An Introduction to Probability Theory and its Applications. Vol. 1 ( 
edition). By W. Fetter. New York: John Wiley and Sons Inc. London: Chapman 


and Hall Ltd. 1957. Pp. 461. 86s. 
1950. There are some alterations but not 
* Fluctua- 


During the seven years since i . 
bability theory connected with the discrete variable. One notes, hopefully, that it is T nid 


Vector Spaces and Matrices. By R. M. THRALL and L. Tornuem. New York: John 


Wiley and Sons Inc. London: Chapman and Hall Lu. 1067. Pp. 318. E Ve 
Although linear algebra is now an important. pert of every specialist ooursein meten ost t 
most text-books on the subject are too abstract for beginners. Books similar — 
Survey of Modern. Algebra, but dealing with linear algebra in greater detail, does, 
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therefore very welcome. In foregoing the rigidly axiomatic approach, the authors have chosen 
a little generality, but the gain in clarity and persuasiveness which they achieve easily outweighs 
lows, è 

The vector theory in this book is mainly finite dimensional. Matrices express linear transformati 
and their main properties are derived before determinants are introduced. Hermitian and unit 
matrices are amociated with a sealar product and the notion of an adjoint. The text is therefo 
excellent basis for the study of representation theory and for the study of Hilbert space. The treatm 
of canonical forms, polynomial rings and the decomposition of algebras, ia exceptionally clear. A € 
cluding chapter deals with linear inequalities and applications to linear programming and r 
games; this moat interesting chapter is all too short. 

The rigorous but unlaboured style of the book should appeal to the serious student. The authe 
have evidently given the greatest care to the details of presentation, and they have done this wi on 
spoiling tho freshness of their exposition. It is a text which students can be recommended to st udy u 
to imitate. The layout is excellent and the book is singularly free from misprints. 


H. KES 


Digital Computer Programming. By D. D. McCracken. New York: John Wiley 
Sons Inc. London: Chapman and Hall Ltd. 1957. Pp. 253. 62s. 


The growth of electronic computers over the past ten years has produced a spate of books on the thean 
and practice of the subject of computing. Many of these books have been specifically devoted toward 
the technique of preparing problems for a machine, the technique commonly known as programmin 
Most accounts of programming are written round a particular machine and the present book is to bi 
welcomed as a general account of the subject. The illustrations are based on a hypothetical mae 
ture the characteristics of which are drawn fr 
a number of well-known computers. The book assumes that the problem has already been reduced fi 


^ r stages; programming, coding the progra 
eom. the programme and the final production of the required solutions. 


Coding is dealt with somewhat more briefly. 
ming is common to all machines whilst coding 


ry and would put the re 
in a good position to tackle the programming manual issued with any electronic computer. 


P. G. MOORE 


Individual Differences in Night-Vision Efficiency. [Medical Research Council 


Special Report Series, No. 294.] By M. H. Prenner, F. H. C. Mannrorr and E. F 
O'DonzRTY. London: H.MS.0. 1957. Pp. 83. 8s. 
This report, arises out of the war-time need to devise tests for selecting individuals with especia 
good night vision. In all tests the subject is presented with a visual task to perform at a very low li 
intensity and given a score based on his performance. The diffieulty is to know just what visual 
to set. In order to increasing complexity the task may consist in: 
(1) The correct recognition of a very dim flash of light in an otherwise completely dark room. 
lowest light intensity that can be reliably recognized in this way is called the threshold. 
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(2) The ability to recognize simple patterns, og letters, correctly at bru bevels of buminata 
The test. of visual acuity is used at higher light intensity by opticians in fitting spectacles. 

(3) The ability to perceive correctly a scene or picture containing many objects, Here many non: 
visual factors enter the game, for objects are identified partly by their appearance, partly by the degree 
to which they make sense" in their context. Tests of this kind approximate most closely to preetical 
life, but are difficult to set and score. Threshold testa, on the other band, are easy to at and more, 
but their practical relevance is not clear, 

The authors of this report have performed testa of each kind with great care on à selection of sub- 
jects and have demonstrated that the results of the three types of test are very strongly correlated, 


wo that for practical purposes one may use the most convenient one. The relationship between the teste 
turned out to be a very simple one; the less sensitive individual always required È times as much light 
as the more sensitive one, It is just as though his eye contained a filter which transmitted only 1/4 
of the incident light, a finding which the authors embody in their ‘filter factor’ theory. 


These conclusions are supported by detailed experimental reports which may be of litthe interest to 
the non-specialist. However, statistical readers will be interested by the sections dealing with the 
basic physiology of vision in very dim light. Performance near the threshold is characterized: by 
uncertain seeing, that is, only a fraction of the flashes given are reported by the subject. Naturally 
this fraction gets smaller as light intensity i» reduced. The interesting thing is that this statistical 


fluctuation is shown to arise from the discontinuous nature of light combined with the remarkably 
high sensitivity of the eye. So few quanta are emitted in the flash that the inevitable variation in their 
number is quite sufficient to account for the variation in the subject's response, D. u. WILKIE 


Life and Other Contingencies, Vol. u. By P. F. Hoorn and L. H. Loxarzv-Coox. 
Cambridge: Published for the Institute and Faculty of Actuaries at the University 
Press. 1957. Pp. 256. 20s. > 

The first volume of this work was reviewed in Vol. 42 (1955, p. 274) of this journal. In that review it 

was mentioned that this is ‘one of a series commissioned by the Institute of Actuaries and the Faculty 

of Actuaries to provide a course of reading suitable for the examinations conducted by those E E 

The present volume should meet this requirement admirably, in S Ub Mn MORE MENT 

required of examination candidates clearly, and in a form which should aid the memory average 

student. 3 x 

There is necessarily rather greater need for skill in algebraic manipulative ability in this volume than 
in the first volume. The authors have, however, succeeded in aspect 


of the consequences of this assumption. j S o 
Interesting features of the book include a discussion of the method of uniform seniority applied 


cases where , 3 
(i) u = ATH Bc* r 
"eem w babe i i widows’ and orphans’ funds, and a chapter 
three chapters on calculations with pensions and? 2 
on disability benefits. There is also an addendum describing the International Actuarial ion 
now officially adopted by the Institute Faculty ty. EM 
This is a book for the student and the specialist, There are worked examples o. — 
chapter, but the value of the book to a student would be enhanced provision of exercises 
problems to be attempted. Nevertheless, for the somewhat restricted ra il 
book should certainly prove a profitable purchase. . In JOHNSO: 
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CORRIGENDA 
Biometrika (1954), 41, pp. 375-89" 


‘Some further EMEN. 
By Atay J. Mayne 


I am indebted to A. R. Bloemena, Research Fellow of the Statistical Department of the 


Mathematisch Centrum, Amsterdam, for pointing out an error in the second equation of 
the proof of Lemma 5 in this paper, on p. 383. This equation should read 


+t 
Qv; t) = no f. Q, (o — ut; 0)dG(u). 
Continuing the argument as in the original text, it is found that, writing . 
TGS) &T' (s; 0) = Ls, u; Gu, 
1-T(s) 
L{s, v; Q(z; v; 0)} = ai- (% 
go 0; Qi v; t)) = 1- e (48) (as corrected) 


s a{1—zI(s)] 
-z[1-T 
and Lís,v; Q(z; v)} e 


which is the correct version of equation (47) in Lemma 5. » 
When this correction is applied, it is found that equation (53) on p. 385, is still valid, so 


that no correction is needed for Theorems 4 and 5. ifs 
R J. M. 


Biometrika (1956), 43, 433 
* Confidence intervals for a proportion.’ 
By EDWIN L. Crow 


The displayed equation for R; should read: 


5 qs 
K. n 
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"THOMAS BAYES—A BIOGRAPHICAL NOTE 


By G. A. BARNARD 
Imperial College, London, P " á 
Bayes's paper, reproduced in the following pages, must 8 of the most famous 
memoirs in the history of science and the problem it discusses is still the subject of keen 
controversy. The intellectual stature of Bayes himself is measured by the fact that it is still 
of scientific as well as historical interest to know what Bayes had to say on the questionshe . 
raised. And yet such are the vagaries of historical records, that almost nothing is known 
about the personal history of the man. T'he Dictionary of National Biography, compiled at 
the end of the last century, when the whole theory of probability was in temporary eclipse in 
England, has an entry devoted to Bayes's father, Joshua Bayes, F. R. S., one of the first six 
Nonconformist ministers to be publicly ordained as such in England, but it has nothing on 
his much more distinguished son. Indeed, the note on Thomas Bayes which is to appear in 
the forthcoming new edition of the Encyclopedia Britannica will apparently be the first 
biographieal note on Bayes to appear in a work of general reference since the Imperial 
Dictionary of Universal Biography was published in Glasgow in 1865. And in treatises on the 
history of mathematics, such as that of Loria (1933) and Cantor (1908), notice is taken of his 
contributions to probability theory and to mathematical analysis, but biographical details 
are lacking. t 1 

The Reverend Thomas Bayes, F. R. S., author of the first expression in precise, quantita- 
tive form of one of the modes of inductive inference, was born in 1702, the eldest son of 
Ann Bayes and Joshua Bayes, F.R.S. He was educated privately, as was usual with Non- 
i d from the fact that when Thomas was 12 Bernoulli wrote to 


conformists at that time, an i 
a living in London by teaching mathe- 


Leibniz that ‘poor de Moivre’ was having to earn : 
matics, we are tempted to speculate that Bayes may have learned ate from one of 
the founders of the theory of probability. Eventually Thomas was ordained, and began his 
ministry by helping his father, who was at the time stated, minister of the Presbyterian 
orn. Later the son went to minister in Tunbridge 
Sion which had been opened on 1 August 


es went to Tunbridge Wells, but he was not the first to 


1720. It is not known when Bay 
minister on Little Mount Sion, and he was certainly there in 1731gwhen he produced a tract 


entitled *Divine Benevolence, or an attempt to prove that the Principle End of the Divine 
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i i t statistical literature, but so rarely 
Thomas Bayes's famous Essay is so often referred to in current Stans 8 ce RH. 

studied because ME the difficulty of access, that the Editors have felt justified, in. reprinting it in the 
Biometrika History of Brobability and Statistics series. " 
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294 Studies in the history of probability and statistics. IX ! 
Providence and Government is the happiness of His Creatures’. The tract was published by 
John Noon and copies are in Dr Williams's library and the British Museum. The following 
is à quotation: 9 


[p. 22]: I don’t find (I am sorry to say it) any necessary connection between mere intelligence, 


though ever so great, and the love or approbation of kind and beneficent actions. 

Bayes argued that the principal end of the Deity was the happiness of His creatures, in 
opposition to Balguy and Grove who had, respectively, maintained that the first spring of | 
action of the Deity was Rectitude, and Wisdom. 

In1736 Noon published a tract entitled ‘An Introduction to the Doctrine of Fluxions, 
and a Defence of the Mathematicians against the objections of the Author of the Analyst’, 
De Morgan (1860) says: "This very acute tract is anonymous, but it was always attributed 
to Bayes by the contemporaries who write in the names of the authors as I have seen in 
various copies, and it bears his name in other places.’ The ascription to Bayes is accepted 
also in the British Museum catalogue. 

From the copy in Dr Williams's library we quote: 


[p. 9]: It is not the business of the Mathematician to dispute whether quantities do in fact ever vary in 


the manner that e but only whether the notion of their doing so be intelligible; which being 


allowed, he has a right to take it for granted, and then see what deductions he can make from that sup- 
position. It is not the business of a Mathematician to show that a strait line or circle can be drawn, but 
he tells you what he means by these; and if you understand him, you may proceed further with him; and 
it would not be to the purpose to object that there is no such thing in nature as a true strait line or 
perfect circle, for this is none of his concern: he is not inquiring how things are in matter of fact, but 
supposing things to be in a certain way, what are the consequences to be deduced from them; and all that 
is to be demanded of him is, that his suppositions be intelligible, and his inferences just from the sup- 
positions he makes. 

[p.48]: He lite, the Analyst = Bishop Berkeley] represents the disputes and controversies among 
mathematicians as disparaging the evidence of their methods: and, Query 51, he represents ics and 
Metaphysics as proper to open their eyes, and extricate them from their difficulties. Now were ever two 
things thus put together? If the disputes of the professors of any scienc disparage the science itself, 
Logics and Metaphysics are much more to be disparaged than Mathematics; why, therefore, if I am half 
blind, must I take for my guide one that can’t see at all? 

[p. 50]: So far as Mathematics do not tend to make men more sober and rational thinkers, wiser and 
I men, they are only to be considered as an amusement, which ought not to take us off from serious 

usiness. , 


This tract may have had something to do with Bayes's election, in 1742, to Fellowship of the 
Royal Society, for which his sponsors were Earl Stanhope, Martin Folkes, James Burrow, 
Cromwell Mortimer, and John Eames. . 
William Whiston, Newton's successor in the Lucasian Chair at Cambridge, who Wee 
expelled from the University for Arianism, notes in his Memoirs (p. 390) that ‘on August the 
24th this year 1746, being Lord’s Day, and St. Bartholomew’s Day, I breakfasted at Mr Bay 8, 
a dissenting Minister at Tunbridge Wells, and a Successor, though not immediate, to 
Mr Humphrey Ditton, and like him a very good mathematician also’. Whiston goes on to 
relate what he said to Bayes, but he gives no indication that Bayes made reply. 
According to Strange (1949) Bayes wished to retire from his ministry as early as 1749, 
when he allowed a group dependents to bring ministers from London to take services in 
his chapel week by week, except for Easter, 1750, when he refi d his pulpit to one of these 
preachers; and in 1752 he was succeeded in his ministry by the Rev. William Johnston, AM, 
who inherited Bayes’s valuable library. Bàyes continued to live in Tunbridge Wells un 
his death on 17 April 1761. His body w. aken to be buried, with that of his father, mother, 
å | d ah $ * 
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brothers and sisters, in the and Cotton family vault in Bunhill Fields, the Noncon- 
formist burial ground by Moorgate. This cemetery also contains the grave of Bayes's friend, 
the Unitarian Rev. Richard Price, author of the Northampton Life Table and object of ~ 
Burke's oratory and invective in Reflections on the French Revolution, and the graves 
of John Bunyan, Samuel Watts, Daniel Defoe, and many other famous men. 

Bayes's will, executed on 12 December 1760, shows him to have been a man of substance. 
The bulk of his estate was divided among his brothers, sisters, nephews and cousins, but he 
left £200 equally between ‘John Boy] late preacher at Newington and now at rich, and 
Richard Price now I suppose preacher at Newington Green’. He also left ‘To h Jeffery 
daughter of John Jeffery, living with her father at the corner of Fountains Lane near 
Tonbridge Wells, £500, and my watch made by Elliott and all my linen and wearing apparell 
and household stuff.’ * 

Apart from the tracts already noted, and the celebrated Essay reproduced here, Bayes 
wrote a letter on Asymptotic Series to John Canton, published in the Philosophical T'ransac- _ 
tions of the Royal Society (1763, pp. 269-271). His mathematical work, though small in 
quantity, is of the very highest quality; both his tract on fluxions and his paper on asymp- 
totic series contain thoughts which did not receive as cléar expression again until almost 
a century had elapsed. : ^ 

Since copies of the volume in which Bayes's essay first appeared are not rare, and copies of 
a photographie reprint issued by the Department of Agriculture, Washington, D.C., U. S.A., 
are fairly widely dispersed, the view has been taken that in preparing Bayes's paper for 
publication here some editing is permissible. In particular, the notation has been modernized, 
some of the archaisms have been removed and what seem to be obvious printer’s errors have 
been corrected. Sometimes, when a word has been omitted in the original, a suggestion has 
been supplied, enclosed in square brackets. Otherwise, however, nothing been changed, 
and we hope that while the present text should in no sense be regarded as definitive, it 
will be easier to read on that account. All the work of preparing the text for the printer was 
most painstakingly and expertly carried out by Mr M. Gilbert, B. Sc., AR. CS. Thanks are 
also due to the Royal Society for permission to reproduce the Essay in its present form. 

P In writing the biographical notes the present author has had the friendly help of many 
persons, including especially Dr A. Fletcher and Mr R. L. Plackett, d the University of 
Liverpool, Mr J. F. C. Willder, of the Department of Pathology, Guy's Hospital Medical 


School, and Mr M. E. Ogborn, F. L.A., of the Equitable Life Assurance Society. He would 


also like to thank Sir Ronald Fisher, for some initial prodding which set him moving, and 
see the matter through to completion. 


Prof. E. S. Pearson, for patient encouragement to 
* 
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AN ESSAY TOWARDS SOLVING A PROBLEM IN THE 
DOCTRINE OF CHANCES 


By THe LATE RRV. Mr BAYES, F. R. S. 
Communicated by Mr Price, in a Letter to John Canton, A. M., F. R. &. 


Read 23 December 1763 
Dear Sir, 


I now send you an essay which I have found among the papers of our deceased friend 


Mr Bayes, and which, in my opinion, has great merit, and well deserves to be preserved. 
Experimental philosophy, you will find, is nearly interested in the subject of it; and on this 


account there seems to be particular reason for thinking that a communication of it to the 


Royal Society cannot be improper. 


He had, you know, the honour of being a member of that illustrious Society, and was much | 


esteemed by many in it as a very able mathematician. In an introduction which he has 
writ to this Essay, he says, that his design at first in thinking on the subject of it was, to 
find out a method by which we might judge concerning the probability tha. an event has to 
happen, in given eireumstances, upon supposition that we know nothing concerning it but 
that, under the same circumstances, it has happened a certain number of times, and failed 
a certain other number of times. He adds, that he soon perceived that it would not be very 
difficult to do this, provided some rule could be found according to which we ought to 


estimate the chance that the probability for the happening of an event perfectly unknown, ` 


should lie between any two named degrees of probability, antecedently to any experiments 
made about it; and that it appeared to him that the rule must be to suppose the chance the 
same that it should lie between any two equidifferent degrees; which, if it were allowed, all 


the rest might be easily calculated in the common method of proceeding in the doctrine il 


chances. Accordingly, I find among his papers a very ingenious solution of this problem i 


this way. But he afterwards considered, that the postulate on which he had argued might not. 
perhaps be looked upon by all as reasonable; and therefore he chose to Jay down in another 
form the proposition in which he thought the solution of the problem is contained, and in 
a scholiwm to subjoin the reasons why he thought so, rather than to take into his mathe- 
matical reasoning any thing that might admit dispute. This, you will observe, is the method | 


which he has pursued in this essay. 

Every judicious person will be sensible that the problem now mentioned is by no means 
merely a curious speculation in the doctrine of chances, but necessary to be solved in order 
to [provide] a sure foundation for all our reasonings concerning past facts, and what is likely 


! 


25 Ps n i - 
to be hereafter. Common sense is indeed sufficient to shew us that, from the observation ot 
what has in former instances been the consequence ofa certain cause or action, one may make | 


a judgment what is likely to be the consequence of it another time, and that the larger [the] 
number of experiments we have to support a conclusion, so much the more reason we have 
to take it for granted. But it is certain that we cannot determine, at least not to any nicety, 
in what degree repeated experiments confirm a conclusion, without the particular discussion 
of the beforementioned problem; which, therefore, is necessary to be considered by any 


Tomas Bayes 297 


one who would give a clear account of the strength of analogical or inductive reasoning; 
concerning, which at present, we seem to know little more than that it does sometimes 
in fact convince us, and at other times not; and that, as it is the means of [a]equainting 
us with many truths, of which otherwise we must have been ignorant; so it is, in all proba- 
bility, the source of many errors, which perhaps might in some measure be avoided, if the 
force that this sort of reasoning ought to have with us were more distinctly and clearly 
understood. 

These observations prove that the problem enquired after in this essay is no less important 
than it is curious. It may be safely added, I fancy, that it is also a problem that has never 
before been solved. Mr De Moivre, indeed, the great improver of this part of mathematics, 
has in his Laws of Chance,* after Bernoulli, and to a greater degree of exactness, given rules 
to find the probability there is, that if a very great number of trials be made concerning any 
event, the proportion of the number of times it will happen, to the number of times it will fail 
in those trials, should differ less than by small assigned limits from the proportion of the 
probability of its happening to the probability ofits failing in one single trial. But I know of 
no person who has shewn how to deduce the solution of the converse problem to this; 
namely, ‘the number of times an unknown eyent has happened and failed being given, to 
find the chance that the probability of its happening should lie somewhere between any two 
named degrees of probability.’ What Mr De Moivre has done therefore cannot be thought 
sufficient to make the consideration of this point unnecessary: especially, as the rules he has 
given are not pretended to be rigorously exact, except on supposition that the number of 
trials made are infinite; from whence it is not obvious how large the number of trials must be 
in order to make them exact enough to be depended on in practice. 

Mr De Moivre calls the problem he has thus solved, the hardest that can be proposed on the 
subject of chance. His solution he has applied to a very important purpose, and thereby 
shewn that those are much mistaken who have insinuated that the Doctrine of Chances in 
mathematics is of trivial consequence, and cannot have a place in any serious enquiry. T The 
purpose I mean is, to shew what reason we have for believing that there are in the constitu- 

tion of things fixt laws according to which events happen, and that, therefore, the frame of 
the world must be the effect of the wisdom and power of an intelligent cause; and thus to 
confirm the argument taken from final causes for the existence of the Deity. It will be easy 
to see that the converse problem solved in this essay is more directly applicable to this 
purpose; for it shews us, with distinctness and precision, in every case of any particular order 
or recurrency of events, what reason there is to think that such recurrency or order is derived 
from stable causes or regulations in nature, and not from any of the irregularities of 
chance. 

The two last rules in this essay are given without the deductions of them. I have chosen to 


do this because these deductions, taking up a good deal of room, would swell the essay too 


much: and also because these rules, though of considerable use, do not answer the purpose for 
however ready to be produced, 


which they are given as perfectly as could be wished. They are 
if a communication of them should be thought proper. I have in some places writ short 
notes, and to the whole I have added an application of the rules in the essay to some 


p. 243, etc. He has omitted the demonstrations of his rules, 


y Beo e e pson at the conclusion of his treatise on The Nature and 


but these have been since supplied by Mr Sim 
Laws of Chance. 
+ See his Doctrine of Chances, p. 252, ete. s d 


as it is applied to past or future facts. But whatever different senses it may have, all (he 
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particular cases, in order to convey a clearer idea of the nature of the problem, and toshew | 
how far the solution of it has been carried. | 

Tam sensible that your time is so much taken up that I cannot reasonably expect that you 
should minutely examine every part of what I now send you. Some of the calculations, 
particularly in the Appendix, no one can make without a good deal of labour. I have taken 
so much care about them, that I believe there can be no material error in any of them; but 
should there be any such errors, I am the only person who ought to be considered as answer- 
able for them. 

Mr Bayes has thought fit to begin his work with a brief demonstration of the general laws 
of chance. His reason for doing this, as he says in his introduction, was not merely that his 
reader might not have the trouble of searching elsewhere for the principles on which he has 
argued, but because he did not know whither to refer him for a clear demonstration of them. 
He has also made an apology for the peculiar definition he has given of the word chance or 
probability. His design herein was to cut off all dispute about the meaning of the word, which 
in common language i is used in different senses by persons of different opinions, and according 


observes) will allow that an expectation depending on the truth of any past fact, or the 


, happening of any future event, ought to be estimated so much the more valuable as the fact 


is more likely to be true, or the event more likely to happen. Instead therefore, of the proper 
sense of the word probability, he has given that which all will allow to be its proper measure 
in every case Where the word is used. But it is time to conclude this letter. Experimental | 


: Foam pe is indebted to you for several discoveries and improvements; and, therefore, 
I cannot help thinking that there is c nd propriety in directing to you the following 
essay and appendix. That your enquiries may be rewarded with many further successes, and 


that y may enjoy every valuable blessing, i is the sincere wish of, Sir, 


$ u 


7 1 * your very humble servant, 
Newington-Green, Richard Price 
10 November 1763 The 

PROBLEM " 


Given the number of times in which an unknown event has happened and failed: Required 
the chance that the probability of its happening in a single trial lies somewhere between | 
any two degrees of probability that can be named. 


T SECTION I 


DEFINITION 1. Several events are inconsistent, when if one of them happens, none of the | 
rest can. q 

2. Two events are contrary when one, or other of them must; and both together cannot d 
happen. 

3. Aneventis said to fail, when it cannot happen; or, which comes to the same thing, e 1 
its contrary has happened. 

. An event is said to be determined when it has either happened or failed. 

5. The probability of any event is the ratio between the value at which an expectation - 
depending on the happening of the event ought to be computed, and the value of the thing 
expected upon it’s s happening. 
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6. By chance I mean the same as probability. 


7. Events are independent when the happening of any one of them does neither increase 
nor abate the probability of the rest. 


Prop. | 

When severakevents are inconsistent the probability of the happening of one or other of 
them is the sum of the probabilities of each of them. 

Suppose there be three such events, and whichever of them happens I am to receive N, and 
that the probability of the Ist, 2nd, and 3rd are respectively a/N, b/N, . Then (by the 
definition of probability) the value of my expectation from the 1st will be a, from the 2nd b, 
and from the 3rd c. Wherefore the value of my expectations from all three will be a 4- b +c. 
But the sum of my expectations from all three is in this case an expectation of receiving NV 
upon the happening of one or other of them. Wherefore (by definition 5) the probability of 
one or other of them is (a+b+c)/N or a -- b] N 4-c| N. The sum of the probabilities of each 
of them. S IT 


ConoLLARY. If it be certain that one or other of the three events must, happen, then. 
a+b+c = N. For in this case all the expectations together amounting to a certain expecta- 
tion of receiving N, their values together must be equal to N. And from hence it is plain that 


idi Ev N 
the probability of an event added to the probability of its failure (or of its contrary) is the 


ratio of equality. For these are two inconsistent events, one of which necessarily ha ppens. 
Wherefore if the probability of an event is P/N that of it’s failure will be (N—P)|N. t 
^ ^ Cr ams OMIM TN ^ 


Prop. 2 ma W Kee, bei 
If a person has an expectation depending on the happening of an event, the probability 24 H 
the event is to the probability of its failure as his loss if it fails to his gain if it happens. 
Suppose a person has an expectation of receiving N, depending on an event the proba- 
bility of which is P/N. Then (by definition 5) the value of his'expecta ionis P, and therefore j 
if the event fail, he loses that which in value is P; and if it happens he receives N, but his 
expectation ceases. His gain therefore is N — P. Likewise since the EAT of the event 
is P/N, that of its failure (by corollary prop. I) ĩs (N-P)|N. But P ist N D as Pis 
to N — P, i.e. the probability of the event is to the probability oo as his loss if it 
fails to his gain if it happens. eu E 
Prop. 3- i $ 22 
The probability that two subsequent events will both happen is a ratio compounded of the 
probability of the Ist, and the probability of the 2nd on E Wm Ist bac pA 
Suppose that, if both events happen, I am to receive N, that ks Lo oer m . 
happen is P/N, that the Ist will is a|N (and consequently that the Ist will not " B rey E 
and that the 2nd will happen upon supposition the Ist does is b[N . Then (by defini: exe x 
P will be the value of my expectation, which will become b if the Ist happens. d y $ 
if the Ist happens, my gain by it is b—P, and if it fails my loss is E: ke f t , ve 
going proposition, / is to (N —a)|N, i.e. a is to Na as P is to Qui ko usi ud 
ponendo inverse) a is to M as Pis to b. But the ratio of P to M is compounded of the ra ae by 


to b, and that of b to N. Wherefore the same ratio of P to N is compounded of the 


ato N and that of b to M, i.e. the probability thatthe two subsequent events will both happen + * 


i 


is compounded of the probability of the Ist and the probability of the 2nd onsupg tion pes E 
Y UP x * a2 

Ist happens. Doa Te p 1 
" ! Lu * 1 
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Corottary. Hence if of two subsequent events the probability of the Ist be a/N, and the 
probability of both together be P/N, then the probability of the 2nd on supposition the 
1st happens is P/a. 

Prop. 4 


If there be twosubsequentevents to be determined everyday, and each day the probability 
of the 2nd is b/N and the probability of both P/N, and I am to receive N if both the events 
happen the first day on which the 2nd does; I say, according to these conditions, the proba- 
bility of my obtaining N is P/b. For if not, let the probability of my obtaining N be z/N and 
let y be to zas N —b to N. Thensincez/Nis the probability of my obtai ning N (by definition 1) 
xis the value of my expectation. And again, because according to the foregoing conditions the 
first day I have an expectation of obtaining N depending on the happening of both the events 
together, the probability of which is P/N, the value of this expectation is P. Likewise, if this 
coincident should not happen I have an expectation of being reinstated in my former 
circumstances, i.e. of receiving that which in value is z depending on the failure of the 2nd 
event the probability of which (by cor. prop. 1) is (N —b)/N or y/x, because y is to x as N 
to N. Wherefore since vis the thing expected and y/æ the probability of obtaining it, the value 
ofthis expectation is y. But these two last expectations together are evidently the same with 
my original expectation, the value of which is z, and therefore P 4- y =x. But y is to x as 
N —b is to N. Wherefore z is to P as N is to b, and */ (the probability of my obtaining N) 
is P/b. 


Con. Suppose after the expectation given me in the foregoing proposition, and before it is 
at all known whether the 1st event has happened or not, I should find that the 2nd event has 
happened; from hence I can only infer that the event is determined on which my expectation 
depended, and have no reason to esteem the value of my expectation either greater or less 
than it was before. For if I have reason to think it less, it would be reasonable for me to give 
something to be reinstated in my former circumstances, and this over and over again as often 
as I should be informed that the 2nd event had happened, which is evidently absurd. And 
the like absurdity plainly follows if you say I ought to set a greater value on my expectation 
than before, for then it would be reasonable for me to refuse something if offered me upon 
condition I would relinquish it, and be reinstated in my former circumstances; and this 
likewise over and over again as often as (nothing being known concerning the 1st event) it 
should appear that the 2nd had happened. Notwithstanding therefore this discovery that 
the 2nd event has happened, my expectation ought to be esteemed the same in value as 
before, i.e. z, and consequently the probability of my obtaining N is (by definition 5) still 
*/ M or P/b.* But after this discovery the probability of my obtaining AN is the probability 
that the Ist of two subsequent events has happened upon the supposition that the 2nd has, 
whose probabilities were as before specified. But the probability that an event has happened 
is the same as the probability I have to guess right if I guess it has happened. Wherefore the 
following proposition is evident. 


* What is here said may perhaps be a little illustrated by considering that all that can be lost by the 
happening of the 2nd event is the chance I should have had of being reinstated in my former cireum- 
stances, if the event on which my expectation depended had been determined in the manner expressed in 
the proposition. But this chance is always as much against me as it is for me. If the Ist event happens, 
it is against me, and equal to the chance for the 2nd event's failing. If the Ist event does not happen» 
it is for me, and equal also to the chance for the 2nd event's failing. The loss of it, therefore, can be no 
disadvantage. 
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Prop. 5 
If there be two subsequent events, the probability of the 2nd b/N and the probability of 
both together P/N, and it being first discovered that the 2nd event has happened, from hence 
I guess that the Ist event has also happened, the probability I am in the right is P/b.* 


Prop. 6 

The probability that several independent events shall all happen is a ratio compounded of 
the probabilities of each. 

For from the nature of independent events, the probability that any one happens is not 
altered by the happening or failing of any of the rest, and consequently the probability that 
the 2nd event happens on supposition the Ist does is the same with its original probability ; 
but the probability that any two events happen is a ratio compounded of the probability 
of the Ist event, and the probability of the 2nd on supposition the Ist happens by prop. 3. 
Wherefore the probability that any two independent events both happen is a ratio com- 
pounded of the probability of the Ist and the probability of the 2nd. And in like manner 
considering the Ist and 2nd events together as one event; the probability that three 
independent events all happen is a ratio compounded of the probability that the two Ist 
both happen and the probability of the 3rd. And thus you may proceed if there be ever so 
many such events; from whence the proposition is manifest. 


Con. 1. If there be several independent events, the probability that the 1st happens the 
2nd fails, the 3rd fails and the 4th happens, ete. is a ratio compounded of the probability of 
the Ist, and the probability of the failure of the 2nd, and the probability of the failure of the 
3rd, and the probability of the 4th, ete. For the failure of an event may always be considered 


as the happening of its contrary. 


Cor. 2. If there be several independent events, and the probability of each one be a, and 
that of its failing be b, the probability that the Ist happens and the 2nd fails, and the 3rd fails 
and the 4th happens, ete. will be abba, etc. For, according to the algebraic way of notation, 
ifa denote any ratio and b another, abba denotes the ratio compounded of the ratios a, b, b, a. 
This corollary therefore is only a particular case of the foregoing. 


Durinirion. If in consequence of certain data there arises a probability that a ven 
event should happen, its happening or failing, in consequence of these data, I call it’s hap- 


pening or failing in the Ist trial. And if the same data be again repeated, the happening or 


failing of the event in consequence of them I call its happening or failing in the 2nd trial; and 


so on as often as the same data are repeated. And hence it is manifest that the happening or 
failing of the same event in so many diffe[rent] trials, is in reality the happening or failing of 
so many distinct independent events exactly similar to each other. 


* What is proved by Mr Bayes in this and the preceding proposition is the same with the answer to the 
following acid What is Bs probability that a certain event, when it happens, will be oo e 
with another to be determined at the same time? In this case, as one of the events is given, nothing can 
be due for the expectation of it; and, consequently, the value of an expectation depenting oe the hapi 
pening of both events must be the same with the value of an expectation depending on bs: proie 5 
one of them. In other words; the probability that, when one of two events happens, ee P is the 
same with the probability of this other. Call x then the probability of this n: 5 i 2 js e ^ e] B 
bability of the given event, and p/N the probability of both, because p/N = (b/N) xa, © = pj» = n 

eor 


probability mentioned in these propositions. 
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Prop. 7 


If the probability of an event be a, and that of its failure be b in each single trial, the 
probability of its happening p times, and failing q times in p+q trials is Hab? if E be the 
coefficient of the term in which occurs dhe when the binomial (a +6)?*4 is expanded. 

For the happening or failing of an event in different trials are so many independent events, 
Wherefore (by cor. 2 prop. 6) the probability that the event happens the Ist trial, fails the 
2nd and 3rd, and happens the 4th, fails the 5th, ete. (thus happening and failing till the 
number of times it happens be p and the number it fails be q) is abbab etc. till the number of 
a's be p and the number of 6’s beq, that is; tis a?b?. In like manner if you consider the event 
as happening p times and failing q times in any other particular order, the probability for itis 
alba; but the number of different orders according to which an event may happen or fail, so 
as in all to happen p times and fail q, in p +q trials is equal to the number of permutations 
that aaaa bbb admit of when the number of a's is p, and the number of b’s is g. And this 
number is equal to E, the coefficient of the term in which occurs a when (a 4-b)»* is 
expanded. The event therefore may happen p times and fail qin p+ q trials Æ different ways 
and no more, and its happening and failing these several different ways are so many incon- 
sistent events, the probability for each of which is ae, and therefore by prop. 1 the proba- 
bility that some way or other it happens p times and fails q times in pq trials is Farbe, 


SECTION II 


POSxULATR. 1. I suppose the square table or plane ABCD to be so made and levelled, that if 
either of the balls o or W be thrown upon it, there shall be the same probability that it rests 
upon any one equal part of the plane as another, and that it must necessarily rest somewhere 
upon it. 

2. I suppose that the ball W shall be first thrown, and through the point where it rests 
a line os shall be drawn parallel to AD, and meeting C D and AB in s and o; and that after- 
wards the ball O shall be thrown p+qorntimes, and that its resting between AD and os after 
a single throw be called the happening ofthe event Mina single trial. These things supposed : 


Lem. I. The probability that the point o will fall 
between any two points in the line AB is the ratio of the 
distance between the two points to the whole line 4B. 

Let any two points be named, as f and b in the line AB, 
and through them parallel to AD draw fF, bL meeting 
CD in F and L. Then if the rectangles Cf, Fb, LA are 
commensurable to each other, they may each be divided 
into the same equal parts, which being done, and the 
ball W thrown, the probability it will rest somewhere 
upon any number of these equal parts will be the sum of H 
the probabilities it has to rest upon each one of them, fi oi 
because its resting upon any different parts of the plane — Pisi 
AC are so many inconsistent events; and this sum, 
because the probability it should rest upon any one equal 

part as another is the same, is the probability it should 
rest upon any one equal part multiplied by the number of 


—————————— 


Tuomas Bayes 303 


parts. Consequently, the probability there is that the ball W should rest somewhere upon Fb 
is the probability it has to rest upon one equal part multiplied by the number of equal parts 
in Fb; and the probability it rests somewhere upon Cf or LA, i.e. that it does not rest upon Fb 
(because it must rest somewhere upon AC) is the probability it rests upon one equal part 
multiplied by the number of equal parts in Cf, LA taken together. Wherefore, the probability 
it rests upon Fb is to the probability it does not as the number of equal parts in Fb is to the 
number of equal parts in Cf, LA together, or as Fb to Cf, LA together, or as fb to Bf, Ab 
together. Wherefore the probability it rests upon Fb is to the probability it does not as fb to 
Bf, Ab together. And (componendo inverse) the probability it rests upon Fb is to the proba- 
bility it rests upon Fb added to the probability it does not, as fb to AB, or as the ratio of fb to 
AB to the ratio of AB to AB. But the probability of any event added to the probability of its 
failure is the ratio of equality; wherefore, the probability it rests upon Fb is to the ratio of 
equality as the ratio of fb to AB to the ratio of AB to AB, or the ratio of equality; and there- 
fore the probability it rests upon Fb is the ratio of fb to AB. But ex hypothesi according as 
the ball W falls upon Fb or not the point o will lie between f and b or not, and therefore the 
probability the point o will lie between f and b is the ratio of fb to AB. 

Again; if the rectangles Cf, Fb, LA are not commensurable, yet the last mentioned 
probability can be neither greater nor less than the ratio of fb to AB; for, if it be less, let it 
be the ratio of fe to AB, and upon the line fb take the points p and t, so that pt shall be greater 
than fe, and the three lines Bp, pt, tA commensurable (which it is evident may be always 
done by dividing AB into equal parts less than half ch, and taking p and t the nearest points 
of division to f and c that lie upon fb). Then because Bp, pt, tA are commensurable, so are the 
rectangles Cp, Dt, and that upon pt compleating the square AB. Wherefore, by what has 
been said, the probability that the point o will lie between p and t is the ratio of pt to AB. 
But if it lies between p and t it must lie between f and b. Wherefore, the probability it should 
lie between f and b cannot be less than the ratio of pt to AB, and therefore must be greater 
than the ratio of fe to AB (since pt is greater than fc). And after the same manner you may 
prove that the forementioned probability cannot be greater than the ratio of fb to AB, it 
must therefore be the same. 

Lum. 2. The ball W having been thrown, and the line os drawn, the probability of the 
event M in a single trial is the ratio of Ao to AB. E 

For, in the same manner as in the foregoing lemma, the probability that the ball o being 
thrown shall rest somewhere upon Do or between AD and so is the ratio of Ao to AB. But the 
resting of the ball o between AD and so after a single throw is the happening of the event M 
in a single trial. Wherefore the lemma is manifest. 


Prop. 8 
Ifupon BA you erect the figure BghikmA whose property is this, that (the ar oh d 
divided into any two parts, as Ab, and Bb and at the point of division ba perpen * ar be d 
erected and terminated by the figure in m; and y, 2, 7 representing respectively ig e * p 4 
bm, Ab, and Bb to AB, and E being the coefficient of the term in which occurs A en : e 
binomial (a 4- Y) vis expanded) y = Harri. Isay that before the ball W is thrown, the proba- 


bility the point o should fall between f and h, any two points named in the line E A bes 
that the event M should happen p times and failqin p--q trials, is the ratio of P. mb, e 
part of the figure BghikmA intercepted between the perpendiculars fg, bm raised upon the 


line AB, to CA the square upon AB. 
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DEMONSTRATION 


For if not; first let it be the ratio of Da figure greater than fghikmb to CA, and through the 
points e, d, c draw perpendiculars to fb meeting the curve AmigB in h, i, k; the point d being 
so placed that di shall be the longest of the perpendiculars terminated by the line fb, and the 
curve AmigB; and the points e, d, c being so many and so placed that the rectangles, bk, ci, 
ei, fh taken together shall differ less from fghikmb than D does; all which may be easily done 
by the help of the equation of the curve, and the difference between D and the figure fghikmb 
given. Then since di is the longest of the perpendicular ordinates that insist upon fb, the rest 
will gradually decrease as they are farther and farther from it on each side, as appears from 
the construction of the figure, and consequently eh is greater than gf or any other ordinate 
that insists upon ef. 

Now if Ao were equal to Ae, then by lem. 2 the probability of the event M in a single trial 
would be the ratio of 4e to AB, and consequently by cor. Prop. 1 the probability of it/s 
failure would be the ratio of Be to AB. Wherefore, if x and r be the two forementioned ratios 
respectively, by Prop. 7 the probability of the event M happening p times and failing q in 
p+q trials would be Hr. But x and r being respectively the ratios of Ae to AB and Be to 
AB, if yis the ratio of ch to A B, then, by construction of the figure AiB,y = Ex"r*. Wherefore, 
if Ao were equal to Ae the probability of the event M happening p times and failing q in 
p +q trials would be y, or the ratio of e to AB. And if Ao were equal to Af, or were any mean 
between Ae and Af, the last mentioned probability for the same reasons would be the ratio of 
fg or some other of the ordinates insisting upon ef, to AB. But eh is the greatest of all the 
ordinates that insist upon ef. Wherefore, upon supposition the point should lie anywhere 
between f and e, the probability that the event M happens p times and fails q in p +q trials 
cannot be greater than the ratio of eh to AB. There then being these two subsequent events, 
the 1st that the point o will lie between e and f. the 2nd that the event M will happen 
p times and fail q in p 4-q trials, and the probability of the first (by lemma 1) is the ratio 
of ef to AB, and upon supposition the Ist happens, by what has been now proved, the 
probability of the 2nd cannot be greater than the ratio of eh to AB, it evidently follows (from 
Prop. 3) that the probability both together will happen cannot be greater than the ratio 
compounded of that of ef to A B and that of ch to AB, which compound ratio is the ratio of 
fh to CA. Wherefore, the probability that the point o will lie between f and e, and the event 
M happen p times and fail q, is not greater than the ratio of h to CA. And in like manner 
the probability the point o will lie between e and d, and the event M happen and fail as before, 
cannot be greater than the ratio of ei to CA. And again, the probability the point o will lie 
between d and c, and the event M happen and fail as before, cannot be greater than the 
ratio of ci to CA. And lastly, the probability that the point o will lie between c and b, and the 
event M happen and fail as before, cannot be greater than the ratio of bk to CA. Add now 
all these several probabilities together, and their sum (by Prop. 1) will be the probability 
that the point will lie somewhere between fand b, and the event M happen p times and fail 
qin p+q trials. Add likewise the correspondent ratios together, and their sum will be the 
ratio of the sum of the antecedents to their common consequent, i.e. the ratio of fh, ei, ci, bk 
together to CA; which ratio is less than that of D to CA, because Dis greater than fh, ei, ci, bk 
together. And therefore, the probability that the point o will lie between f and b, and withal 
that the event M will happen p times and fail qin p +q trials, is less than the ratio of D to CA; 
but it was supposed the same which is absurd. And in like manner, by inseribing rectangles 
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within the figure, as eg, dh, dk, cm, you may prove that the last mentioned probability is 
greater than the ratio of any figure less than fghikmb to CA. 
Wherefore, that probability must be the ratio of fghikmb to CA. 


Cor. Before the ball W is thrown the probability that the point o will lie somewhere 
between A and B, or somewhere upon the line AB, and withal that the event M will happen 
p times, and fail q in p+q trials is the ratio of the whole figure AiB to CA. But it is certain 
that the point o will lie somewhere upon AB. Wherefore, before the ball W is thrown the 
probability the event M will happen p times and fail in p +q trials is the ratio of AiB to CA. 


Prop. 9 

If before anything is discovered concerning the place of the point o, it should appear that 
the event M had happened p times and failed q in p trials, and from hence I guess that the 
point o lies between any two points in the line AB, as f and b, and consequently that the 
probability of the event M in a single trial was somewhere between the ratio of Ab to AB and 
that of Af to AB: the probability I am in the right is the ratio of that part of the figure AiB 
described as before which is intercepted between perpendiculars erected upon AB at the 
points f and b, to the whole figure AiB. 

For, there being these two subsequent events, the first that the point o will lie between 


f and b; the second that the event M should happen p times and fail q in p +q trials; and (by 


cor. prop. 8) the original probability of the second is the ratio of AiB to CA, and (by prop. 8) 
the probability of both is the ratio of fghimb to CA; wherefore (by prop. 5) it being first 
discovered that the second has happened, and from hence I guess that the first has happened 
also, the probability I am in the right is the ratio of fghimb to AiB, the point which was to 
be proved. 


Cor. The same things supposed, if I guess that the probability of the event M lies some- 
where between 0 and the ratio of Ab to AB, my chance to be in the right is the ratio of 
Abm to AiB. 


Scholiwm 

From the preceding proposition it is plain, that in the case of such an event as I there call M : 
from the number of times it happens and fails in a certain number of trials, without knowing 
anything more concerning it, one may give a guess whereabouts it's probability is, and, by 
the usual methods computing the magnitudes of the areas there mentioned, see the chance 
that the guess is right. And that the same rule is the proper one to be used in the case of an 
event concerning the probability of which we absolutely know nothing antecedently to any 
trials made concerning it, seems to appear from the following consideration; viz. that 
concerning such an event I have no reason to think that, in a certain number of trials, : 
should rather happen any one possible number of times than another. For, ja ed ns ; 
I may justly reason concerning it as if its probability had been at first un xe „an : E 
determined in such a manner as to give me no reason to think that, in a certain number o 
trials, it should rather happen any one possible number of times than another. But this is 
exactly the case of the event M. For before the ball W is thrown, which c d a 
probability in a single trial (by cor. prop. 8). the probability it has to happen p times anc ail 
qin p +q or n trials is the ratio of AiB to CA, which ratio is the same when = “a id 18 a 
whatever number p is; as will appear by computing the magnitude of AiB by the metho 


* 
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of fluxions.* And consequently before the place of the point ois discovered or the number of 
times the event M has happened in n trials, I can have no reason to think it should rather 
happen one possible number of times than another. 

In what follows therefore I shall take for granted that the rule given concerning the event 
M in prop. 9is also the rule to be used in relation to any event concerning the probability of 
which nothing at all is known antecedently to any trials made or observed concerning it. 
And such an event I shall call an unknown event. 

Con. Hence, by supposing the ordinates in the figure AiB to be contracted in the ratio of 
E to one, which makes no alteration in the proportion of the parts of the figure intercepted 
between them, and applying what is said of the event M to an unknown event, we have the 
following proposition, which gives the rules for finding the probability of an event from the 
number of times it actually happens and fails. 


Prop. 10 
If a figure be described upon any base AH (Vid. Fig.) having for it’s equation y = xr; 
where y, x, r are respectively the ratios of an ordinate of the figure insisting on the base at 
right angles, of the segment of the base intercepted between the 
ordinate and A the beginning of the base, and of the other seg- 9 
ment of the base lying between the ordinate and the point H, to 
the base as their common consequent. I say then that if an 
unknown event has happened p times and failed q in p +q trials, 
and in the base AH taking any two points as f and t you erect 
the ordinates fC, tF at right angles with it, the chance that the 
probability of the event lies somewhere between the ratio of Af VÀ N 
to AH and that of At to AH, is the ratio of tFCf, that part of 4 t 1 
the before-described figure which is intercepted between the two 
ordinates, to ACFH the whole figure insisting on the base AH. 
This is evident from prop. 9 and the remarks made in the foregoing scholium and corollary. 
Now, in order to reduce the foregoing rule to practice, we must find the value of the 
area of the figure described and the several parts of it separated, by ordinates perpendicular 
to its base. For which purpose, suppose AH = 1 and HO the square upon AH likewise = 1, 
and Cf will be = y, and Af = x, and Hf — r, because y, x and r denote the ratios of Of, Af, 
and Hf respectively to AH. And by the equation of the curve y = x?r* and (because 
Af--fH = AH) r--x = 1. Wherefore 
y = z»(1—ay 
= argon, FE Dr qu Da- ee , 
Now the abscisse being x and the ordinate æ? the correspondent area is & /) (by 
prop. 10, cas. 1, Quadrat. Newt.) and the ordinate being qz?*! the area is qz?**/(p + 2); and 
It will be proved presently in art. 4 by computing in the method here mentioned that AiB contracted 
in the ratio of E to 1 is to CA as 1 to (n-4- 1)E: from whence it plainly follows that, antecedently to this 


contraction, AiB must be to CA in the ratio of 1 to n+1, which is a constant ratio when n is given, 
whatever p is. 


T "Tis very evident here, without having recourse to Sir Isaac Newton, that the fluxion of the area 
ACf being 909 —1) 


ye = , - e 2 4 t*Z— ete, 


f 


Dat gort q(q — 1) z»** 
ptl p42 2(p +3) 


the fluent or area itself is 


— eto. 


a n. 


—-——————— 


TT --— 
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in like manner of the rest. Wherefore, the abscisse being æ and the ordinate y or 
x? —qxP+1 + ete. the correspondent area is 


apti gr q(q—1)z*** q(q—1)(q—2)z»** 
ptl pt+2 2(p +3) 2.3(p-- 4) 


Wherefore, if x = Af = Af/(AH), and y = Cf = Cf/(AH), then 


+ete. 


From which equation, if q be a small number, it is easy to find the value of the ratio of 
ACf to HO and in like manner as that was found out, it will appear that the ratio of HCf 


vod ret pr p(p-l)m? p(p-1)(p-3)re* 


7 Fete. 
771 97 207 2.3q+4) 


which series will consist of few terms and therefore is to be used when p is small. 
2. The same things supposed as before, the ratio of ACf to HO is 


artt | eres q(q — Dr * q(g — 1) (q 2) 2" tre? 
p*1 ^ (p-l)(p*3) (p D(p*2(p*3). (p 1) (p--2)(p3)(p- 4) i 

gng(g — 1)...1 0 
(n+1)(p+1)(p+2)...n’ 


where n = p-+q. For this series is the same with 25 (p + 1) — qz?**/(p + 2) N eto. set down 
in Art. Ist as the value of the ratio of ACf to HO; as will easily be seen by putting in the former 
instead of r its value 1—2, and expanding the terms and ordering them according to the 
powers of z. Or, more readily, by comparing the fluxions of the two series, and in the former 


instead of ? substituting -.“ 


Teto.4- 


3. In like manner, the ratio of HC to HO is 


rey» pra p(p— 1) zz»? 


— te. 
2 T De G+Da+20+3) 


* The fluxion of the first series is 
Gru e q(q—l) pre- 90 — I) vr. 1) (q—2) z?*9r« . 
pti S* ope (p 1) (5-2) (p D (p+2) (p-1) (p 2) (p 9) 


Prii + 


or, substituting -& for 7, 

qu» dre 402 — Drag 909 — . 
$41 * pti CTO N 

is equal to , 


e 


which, as all the terms after the first destroy one another, 


(q-)) 
ret = I- = pierna 2 a* — ete. 


-1)z?". 
zac quina EX. 2 £-—etc. 
gru qz?" 


— the fluxion of the latter series, or SEE TM 


The two series therefore are the same. 


È 
L s 
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4. If E be the coefficient of that term of the binomial (a . Y) v expanded in which occurs 
auh, the ratio of the whole figure ACFH to HO is {(n + 1) Eu-, n being = p--q. For, when 
Af = AH,x = 1,r = 0. Wherefore, all the terms of the series set down in Art, 2 as expressing 
the ratio of ACf to HO will vanish except the last, and that becomes 


q(g —1)...1 
(n.4- 1) (p - 1) (p 4- 2)...n^ 


But E being the coefficient of that term in the binomial (a +b)” expanded in which occurs 
a”b% is equal to 


(p+1)(p+2)...n 
90 — 1) 1 


And, because 4f is supposed to become = AH, ACf = ACH. From whence this article is 
plain. 


5. The ratio of ACf to the whole figure ACFH is (by Art. 1 and 4) 


gpl quart? q(q — 1) x?+3 | 
1E —— —— 

ey +1 p+2 2(p+3) 

and if, as z expresses the ratio of Af to AH, X should express the ratio of At to AH; the 

ratio of AFt to ACFH would be 


(* ＋＋ 


X n 90 — e " | 
+1 p+2 2(p +3) à 


and consequently the ratio of tFCf to ACFH is (n+ 1)Z multiplied into the difference 


between the two series. Compare this with prop. 10 and we shall have the following practical 


rule. 
Rule 1 


If nothing is known concerning an event but that it has happened p times and failed q in 
P+q orn trials, and from hence I guess that the probability of its happening in a single trial 
lies somewhere between any two degrees of probability as X and a, the chance I am in the 
right in my guess is (n + 1) E multiplied into the difference between the series 


Xr Lox q(q S 2 
p+l p+2 2(p +3) 


4 qaxr+2 q(q = 1) r+ 
p+l PTT 2(p+3)_ 
E being the coefficient of a?5? when (a +b)” is expanded. 
This is the proper rule to be used when q is a small number; but if q is large and p small, 
change everywhere in the series here set down p into q and q into p and z into r or (1—2), 


and X into R = (1— X); which will not make any alteration in the difference between the 
two series. 


Thus far Mr Bayes's essay. 

With respect to the rule here given, it is further to be observed, that when both p and q are 
very large numbers, it will not be possible to apply it to practice on account of the multitude 
of terms which the series in it will contain. Mr Bayes, therefore, by an investigation which it 
would be too tedious to give here, has deduced from this rule another, which is as follows. 


— etc. 


and the series — ete. 
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Rule 2 
If nothing is known concerning an event but that it has happened p times and failed q in 
p+qor n trials, and from hence I guess that the probability of its happening in a single trial 
lies between (p/n) -- and (p/n) —2; if m? = n*/(pq), a = p/n, b = q/n, E the coefficient of the 
term in which occurs a?b? when (a +b)” is expanded, and 


Sis (n V (pd) p ups 
nn 


multiplied by the series 


med (n—-2)m* (n—2) (n—4)m'Z | (n—2)(n—4)(n—6) met _ 


3 2n.5 2n.3n.7 2n.3n.4n.9 ES 


M — 


my chance to be in the right is greater than 


ae 
13 2BaPbt+ 2Barbafn 
22 
and less than 1 2Eari — Barbin’ 


and if p = q my chance is 2X exactly. i 

In order to render this rule fit for use in all cases it is only necessary to know how to find 
within sufficient nearness the value of Hab? and also of the series mz— jm*2 + ete. With 
respect to the former Mr Bayes has proved that, supposing K to signify the ratio of the 
quadrantal are to its radius, Zarb? will be equal to 4,/n/,/(Kpq) multiplied by the ratio, 
[A], whose hyberbolic logarithm is 


i TU NN 5 
12|n p 4] 360 [ p? d 
111 bas ms 1 [3-4 T 
xis a 27 2] - rese p g| us pP f 


* In Mr Bayes’s manuscript this chance is made to be greater than 25/(1-+2#a%b*) and less m 
2X/(1 — 2Ea?b3). The third term in the two divisors, as Thave given them, being omitted. 1 
evidently owing to a small oversight in the deduction of this rule, which I have reason to t] rd p 
had himself discovered, I have ventured to correct his copy, and to give the rule as I am sai it 


ought to be given. : k NE y 
is series will rally give the hyperbolic logarithm to a sufficient degree o: 
Lenz ee 1 7 ar V Deos: Mr Simpson and other eminent mathe- 


exactness. A similar series has been given by 1 1 
maticians in an expression for the sum of the logarithms of the numbers l, 2, 3,4, 5, to r. which sum they 


have asserted to be equal to 


a ated apu 
loge (+ })logz—@+7575 3005 126025 ` 


i i i ius is unity. Mr Bayes, in a preceding paper in this 
c denoting the circumference of a circle whose radius is unity But i 
volume, im demonstrated that, though this expression will very nearly approach to the value of this 


sum when only a proper number of the first terms is taken, the € series paren ues mcd cesi 
: it wi i f the series where i s 
at all, because, let æ be what it will, there will always be a part of à ^ 
This Fierce though it does not much affect the use of this series, seems well worth the notice of 
mathematicians. i 
Biom. 45 
20 
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where the numeral coefficients may be found in the following manner. Call them 
A, B, C, D, E ete. Then 


. As yi 1 10B+A 
e r Sone 

D 1 35C+21B+A 1 — 1260-84D436B A 
NES Wm " 1 9 , 
pe 462D +3300 +1658 +55B+A | 

59:19:19 7 ll 


where the coefficients of B, C, D, E, F, etc. in the values of D, E, F, etc. are the 2, 3, 4, ete. 
highest coefficients in (a 4- b)", (ab), (a+6)", ete. expanded; affixing in every particular 
value the least of these coefficients to B, the next in magnitude to the furthest letter from B, 
the next to C, the next to the furthest but one, the next to D, the next to the furthest but 
two, and so on.* 

With respect to the value of the series 
(n 2) mD 


21. 5 go 


mz— m + 
he has observed that it may be calculated directly when mz is less than 1, or even not greater 
than 4/3: but when mz is much larger it becomes impracticable to do this; in which case he 
shews a way of easily finding two values of it very nearly equal between which its true value 
must lie. 

The theorem he gives for this purpose is as follows. 
Let K, as before, stand for the ratio of the quadrantal arc to its radius, and H for the E 
whose hyperbolic logarithm is 
22-1] 24-1 2%—1 2—1 


2n 36055 1260n"  1680n7 + 


Then the series mz gm + etc. will be greater or less than the series 


2m222 An 222  1n42 
EE 
(n--1)/2 ^ (nx3)2mz * (n33)(n4-4) Im 
3n3 ( A ej ee ( 2 anny 
n 


^ (n2) (n4-4) (n+ 6) 8m5z5 * (n2) U (n+ Tü 


continued to any number of terms, according as the last term has a positive or a negative 
sign before it. 


From substituting these values of a7? and 


in the second rule arises a third rule, which is the rule to be used when mz is of some con- 
siderable magnitude. 


* This method of finding these coefficients I have deduced from the demonstration of the third lemma 
at the end of Mr Simpson 's Treatise on the Nature ond Laws of Chance. 


111 
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| Rule 3 
If nothing is known of an event but that it has happened p times and failed q in p q or 
n trials, and from hence I judge that the probability of its happening in a single trial lies 
between p/n +z and p/n—z my chance to be right is greater than 
AV(Kpayh — [og (n 1) (13min 
VVA) + hnt + hn VK (n+2)mz — =) 


and less than 


Vg (en [2m (n-+1) (1 — 2mizt|n is 
V(Kpg) n -hn VK (n2) mz AK (n +2) (n +4) 2m 


where an, K,h and H stand for the quantities already explained. i 


AN APPENDIX 
Containing an application of the foregoing Rules to some particular Cases 


The first rule gives a direct and perfect solution in all cases; and the two following rules are only particular 
methods of approximating to the solution given in the first rule, when the labour of applying it becomes 
too great. 

The first rule may be used in all cases where either p or ꝗ are nothing or not large. The second rule may 
be used in all cases where mz is less than 4/3; and the third in all cases where m?z? is greater than I and less 
than łn, if n is an even number and very large. If n is not large this last rule cannot be much wanted, 
because, m decreasing continually as n is diminished, the value of z may in this case be taken large, (and 
therefore a considerable interval had between p/n—zand p/n +z), and yet the operation be carried on by 
the second rule; or mz not exceed 4/3. + 

-But in order to shew distinctly and fully the nature of the present problem, and how far Mr Bayes has 
carried the solution of it; I shall give the result of this solution in a few cases, beginning with the lowest 
and most simple. 

Let us then first suppose, of such an event as that called M in the essay, or an event about the proba- 
bility of which, antecedently to trials, we know nothing, that it has happened once, and that it is enquired 
what conclusion we may draw from hence with respect to the probability of it’s happening on a second 
trial. 

The answer is that there would be an odds of three to one for somewhat more than an even chance that 
it would happen on a second trial. 

For in this case, and in all others where q is nothing, the expression 


(n+1) zu XVI 
ptl pti 


gives the solution, as will appear from considering the first rule. Put therefore in this expression 
p+1=2,X = lands = and it will be 1 — (4)? or 3; which shews the chance there is that the probability 
of an event that has happened once lies somewhere between 1 and 3; or (which is the same) the odds that 
it is somewhat more than an even chance that it will happen on @ second trial.* £ : 

In the same manner it will appear that if the event has happened twice, the odds now mentioned will be 
seven to one; if thrice, fifteen to one; and in general, if the event has happened p times, there will be 
an odds of 25 — 1 to one, for more than an equal chance that it will happen on further trials. 

Again, suppose all I know of an event to be that it has happened ten times without failing, ena the 
enquiry to be what reason we shall have to think we are right if we guess that the probability o: it's 
here between 19 and $, or that the ratio of the causes of it’s 


appena eee iat aes be that of sixteen to one and two to one 
h i it’s failure is some ratio between 0 f 
pec IM d x= and X- (13) — (ö) = 0:5013 ete. The answer therefore 


is, that we shall have very nearly an equal chance for being right. 


* There can, I suppose, be no reason for observing that on this subject unity is always made to stand 


for certainty, and 3 for an even chance. 
20-2 
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In this manner we may determine in any case what conclusion we ought to draw from a given number 
of experiments which are unopposed by contrary experiments. Every one sees in general that there is 
reason to expect an event with more or less confidence according to the greater or less number of times in 
which, under given circumstances, it has happened without failing; but we here see exactly what this 
reason is, on what principles it is founded, and how we ought to regulate our expectations. 

But it will be proper to dwell longer on this head. 

Suppose a solid or die of whose number of sides and constitution we know nothing; and that we are to 
judge of these from experiments made in throwing it. 

In this case, it should be observed, that it would be in the highest degree improbable that the solid 
should, in the first trial, turn any one side which could be assigned beforehand; because it would be 
known that some side it must turn, and that there was an infinity of other sides, or sides otherwise 
marked, which it was equally likely that it should turn. The first throw only shews that it has the side 
then thrown, without giving any reason to think that it has it any one number of times rather than any 
other. It will appear, therefore, that after the first throw and not before, we should be in the circumstances 
required by the conditions of the present problem, and that the whole effect of this throw would be to 
bring us into these cireumstances. That is: the turning the side first thrown in any subsequent single trial 
would be an event about the probability or improbability of which we could form no judgment, and of 
which we should know no more than that it lay somewhere between nothing and certainty. With the 
second trial then our caleulations must begin; and if in that trial the supposed solid turns again the same 
side, there willarise the probability of three to one that it has more of that sort of sides than of all others; 
or (which comes to the same) that there is somewhat in its constitution disposing it to turn that side 
oftenest: And this probability will increase, in the manner already explained, with the number of times in 
which that side has been thrown without failing. It should not, however, be imagined that any number 
of such experiments can give sufficient reason for thinking that it would never turn any other side. For, 
suppose it has turned the same side in every trial a million of times. In these cireumstances there would 
be an improbability that it has less than 1,400,000 more of these sides than all others; but there would 
also be an improbability that it had above 1,600,000 times more. The chance for the latter is expressed 
by 1,600,000/1,600,001 raised to the millioneth power subtracted from unity, which is equal to 0-4647 ete 
and the chance for the former is equal to 1,400,000/1,400,001 raised to the same power, or to 0-4895; 
which, being both less than an equal chance, proves what I have said. But though it would be thus 
improbable that it had above 1,600,000 times more or less than 1,400,000 times more of these sides than of 
all others, it by no means follows that we have any reason for judging that the true proportion in this ease 
lies somewhere between that of 1,600,000 to one and 1,400,000 to one. For he that will take the pains to 
make the caleulation will find that there is nearly the probability expressed by 0-527, or but little more 
than an equal chance, that it lies somewhere between that of 600,000 to one and three millions to one. 
It may deserve to be added, that it is more probable that this proportion lies somewhere between that 
of 900,000 to 1 and 1,900,000 to 1 than between any other two proportions whose antecedents are to one 
another as 900,000 to 1,900,000, and consequent unity. 

I have made these observations chiefly because they are all strictly applicable to the events and 
appearances of nature. Antecedently to all experience, it would be improbable as infinite to one, that 
any particular event, beforehand imagined, should follow the application of any one natural object to 
another; because there would be an equal chance for any one of an infinity of other events. But if we had 
once seen any particular effects, as the burning of wood on putting it into fire, or the falling of a stone 
on detaching it from all contiguous objects, then the conclusions to be drawn from any number of sub- 
sequent events of the same kind would be to be determined in the same manner with the conclusions just 
mentioned relating to the constitution of the solid I have supposed. In other words. The first experiment 
supposed to be ever made on any natural object would only inform us of one event that may follow 
a particular change in the circumstances of those objects; but it would not suggest to us any ideas of 
uniformity in nature, or give us the least reason to apprehend that it was, in that instance or in any other, 
regular rather than irregular in its operations. But if the same event has followed without interruption in 
any one or more subsequent experiments, then some degree of uniformity will be observed; reason will be 


given to expect the same success in further experiments, and the calculations directed by the solution of 
this problem may be made. 


One example here it will not be amiss to give. 

Let us imagine to ourselves the case of a person just brought forth into this world, and left to collect 
from his observation of the order and course of events what powers and causes take place in it. The Sun 
would, probably, be the first object that would engage his attention; but after losing it the first night he 
would be entirely ignorant whether he should ever see it again. He would therefore be in the condition of 
a person making a first experiment about an event entirely unknown to him. But let him see a second 
appearance or one return of the Sun, and an expectation would be raised in him of a second return, and he 
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might know that there was an odds of 3 to 1 for some probability of this. This odds would increase, as 
before represented, with the number of returns to which he was witness. But no finite number of returns 
would be sufficient to produce absolute or physical certainty. For let it be supposed that he has seen it 
return at regular and stated intervals a million of times. The conclusions this would warrant would be 
such as follow. There would be the odds of the millioneth power of 2, to one, that it was likely that it would 
return again at the end of the usual interval. There would be the probability expressed by 0-5352, that 
the odds for this was not greater than 1,600,000 to 1; and the probability expressed by 0-5105, that it was 
not less than 1,400,000 to 1. 

It should be carefully remembered that these deductions suppose a previous total ignorance of nature. 
After having observed for some time the course of events it would be found that the operations of nature 
are in general regular, and that the powers and laws which prevail in it are stable and permanent. The 
consideration of this will cause one or a few experiments often to produce a much stronger expectation of 
success in further experiments than would otherwise have been reasonable; just as the frequent observa- 
tion that things of a sort are disposed together in any place would lead us to conclude, upon discovering 
there any object of a particular sort, that there are laid up with it many others of the same sort. It is 
obvious that this, so far from contradicting the foregoing deductions, is only one particular case to which 
they are to be applied. 

What has been said seems sufficient to shew us what conclusions to draw from uniform experience. It 
demonstrates, particularly, that instead of proving that events will always happen agreeably to it, there 
will be always reason against this conclusion. In other words, where the course of nature has been the 
most constant, we can have only reason to reckon upon a recurrency of events proportioned to the degree 
of this constancy; but we ean have no reason for thinking that there are no causes in nature which will 
ever interfere with the operations of the causes from which this constancy is derived, or no circumstances 
of the world in which it will fail. And if this is true, supposing our only data derived from experience, 
we shall find additional reason for thinking thus if we apply other principles, or have recourse to such 
considerations as reason, independently of experience, can suggest. 

But I have gone further than I intended here; and it is time to turn our thoughts to another branch of 
this subject: I mean, to cases where an experiment has sometimes succeeded and sometimes failed. 

Here, again, in order to be as plain and explicit as possible, it will be proper to put the following case, 
which is the easiest and simplest I can think of. ? 1 

Let us then imagine a person present at the drawing ofalottery, who knows nothing of its scheme or of 
the proportion of Blanks to Prizes init. Let it further be supposed, that he is obliged to infer this from the 
number of blanks he hears drawn compared with the number of prizes; and that it is enquired what 
conclusions in these cireumstances he may reasonably make. n 

Let him first hear ten blanks drawn and one prize, and let it be enquired what chance he will have for 
being right if he guesses that the proportion of blanks to prizes in the lottery lies somewhere between the 

roportions of 9 to 1 and 11 to 1. y 
: Hore taking X L24z- fp = 10,9 = I. un = 110—115 the required chance, according to the first 
rule, is (n + 1) E multiplied by the difference between 


Le ara qur) [(- ]- qun. a] = 0-07600 ee. 
[i eM bu [i-es] = 120. W £13 mcr 


There would therefore be an odds of about 923 to 76, or nearly 12 to 1 against his 5 —. 

he guessed only in general that there were less than 9 blanks na a prize, there would have been a proba- 

bility of his being right equal to 0.6589, or the odds of 65 to 34. : dec 
Again, suppose that he has heard 20 blanks drawn and 2 prizes; what chance will he have for being right 


if he makes the same guess? 
Here X and being the same, we haven = 22, p = 20,q = 2, = 231, 


xe get Es Li- ams |} = 0-10843 ete. 
(n+ VB) pt2 20-43 31 pr 2055 


and the required chance equal to 


being right than in the former instance, the odds against 
should he only guess in general, as before, that there were 
for instead of 0-6589 or an odds of 


He will, therefore, have a better chance nei 
him now being 892 to 108 or about 9 to 1. But shou o : 
less than 9 blanks to a prize, his chance 8 hon.) yon mn s worse; 

it will be 0-584, or an 8 0 / ; . f 
aci s pared ‘thes he has heard 40 blanks drawn and 4 prizes; what will the before-mentioned 


chances be? 
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The answer here is 0-1525, for the former of these chances; and 0-527, for the latter. There will, there- 
fore, now be an odds of only 5} to 1 against the proportion of blanks to prizes lying between 9 to 1 and 
11 to 1; and but little more than an equal chance that it is less than 9 to 1. 

Once more. Suppose he has heard 100 blanks drawn and 10 prizes. 

The answer here may still be found by the first rule; and the chance for a proportion of blanks to prizes 
less than 9 to 1 will be 0-44109, and for a proportion greater than 11 to 1, 0-3082. It would therefore 
be likely that there were not fewer than 9 or more than 11 blanks to a prize. But at the same time it will 
remain unlikely* that the true proportion should lie between 9 to 1 and 11 to 1, the chance for this being 
0-2506 etc. There will therefore be still an odds of near 3 to 1 against this. 

From these calculations it appears that, in the cireumstances I have supposed, the chance for being 
right in guessing the proportion of blanks to prizes to be nearly the same with that of the number of blanks 
drawn in a given time to the number of prizes drawn, is continually increasing as these numbers increase; 
and that therefore, when they are considerably large, this conclusion may be looked upon as morally 
certain. By parity of reason, it follows universally, with respect to every event about which a great 
number of experiments has been made, that the causes of its happening bear the same proportion to the 
causes of its failing, with the number of happenings to the number of failures; and that, if an event whose 
causes are supposed to be known, happens oftener or seldomer than is agreeable to this conclusion, there 
will be reason to believe that there are some unknown causes which disturb the operations of the known 
ones. With respect, therefore, particularly to the course of events in nature, it appears, that there is 
demonstrative evidence to prove that they are derived from permanent causes, or laws originally estab- 
lished in the constitution of nature in order to produce that order of events which we observe, and not 
from any of the powers of chance. This is just as evident as it would be, in the case I have insisted on, 
that the reason of drawing 10 times more blanks than prizes in millions of trials, was, that there were in 
the wheel about so many more blanks than prizes. 

But to proceed a little further in the demonstration of this point. 

We have seen that supposing a person, ignorant of the whole scheme of a lottery, should be led to 
conjecture, from hearing 100 blanks and 10 prizes drawn, that the proportion of blanks to prizes in the 
lottery was somewhere between 9 to 1 and 11 to 1, the chance for his being right would be 0-2506 eto. 
Let [us] now enquire what this chance would be in some higher cases. 

Let it be supposed that blanks have been drawn 1000 times, and prizes 100 times in 1100 trials. 

In this case the powers of X and z rise so high, and the number of terms in the two series 


X» X 2 f 5 
l and eee. 
ptl p+2 ptl p+2 


become so numerous that it would require immense labour to obtain the answer by the first rule. "Tis 
necessary, therefore, to have recourse to the second rule. But in order to make use of it, the interval 
between X and x must be a little altered. 19 — 9. is 119» and therefore the interval between 19 — yty and 
12-115 will be nearly the same with the interval between 1% and H, only somewhat larger. If then we 
make the question to be; what chance there would be (supposing no more known than that blanks have 
been drawn 1000 times and prizes 100 times in 1100 trials) that the probability of drawing a blank in a 
single trial would lie somewhere between 1131s and 49 + 45 we shall have a question of the same kind 
with the preceding questions, and deviate but little from the limits assigned in them. 
The answer, according to the second rule, is that this chance is greater than 


2x 
Ja »ba 
1+ 2 abe 4 u-. 
n 


and less than 
arb! 
1—2Ea»b« — 2] — 

n 


X being — Earba {ms 5 = TOY ete) 


I suppose no attentive person will find any diffieulty in this. It is only saying that, supposing the 
interval between nothing and certainty divided into a hundred equal chances, there will be 44 of them 
for a less proportion of blanks to prizes than 9 to 1, 31 for a greater than 11 to 1, and 25 for some propor- 
tion between 9 to Land 11 to 1; in which it is obvious that, though one of these suppositions must be true, 
yet, having each of them more chances against them than for them, they are all separately unlikely. 

t See Mr De Moivre's Doctrine of Chances, page 250. 


| 
| 
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By making here 1000 = p, 100 = g, 1100 = n, N 2, 


3 
ro, Bene hi, 
h being the ratio whose hyperbolic logarithm is 


L fay Lek AE Fa eau ld $5 3 | i 
La p d 360 L 2| * 1260 E al j 
and K the ratio of the quadrantal arc to radius; the former of these expressions will be found to be 
0:7953, and the latter 0-9405 etc. The chance enquired after, therefore, is greater than 0-7953, and leas 
than 0-9405. That is; there will be an odds for being right in guessing that the proportion of blanks to 
prizes lies nearly between 9 to Land 11 to 1, (or exactly between 9 to 1 and 1111 to 99), which is greater than 
4 to 1, and less than 16 to 1. 
Suppose, again, that no more is known than that blanks have been drawn 10,000 times and prizes 1000 
times in 11,000 trials; what will the chance now mentioned be? 
Here the second as well as the first rule becomes useless, the value of mz being so great as to render 
it scarcely possible to calculate directly the series 


mz” (n—2)m*z* 
{ms — E x 3n.5 ete. 
The third rule, therefore, must be used; and the information it gives us is, that the required 
chance is greater than 097421, or more than an odds of 40 to 1. 

By calculations similar to these may be determined universally, what expectations are warranted by 
any experiments, according to the different number of times in which they have succeeded and failed; or 
what should be thought of the probability that any particular cause in nature, with which we have any 
acquaintance, will or will not, in any single trial, produce an effect that has been conjoined with it. 

Most persons, probably, might expect that the chances in the specimen I have given would have been 
greater than I have found them. But this only shews how liable we are to error when we judge on this 
subject independently of calculation. One thing, however, should be remembered here; and that is, the 
narrowness of the interval between 3% and 4, or between 12110 and 1211. Had this interval been 
taken a little larger, there would have been a considerable difference in the results of the calculations. 
Thus had it been taken double, orz = gis, it would have been found in the fourth instance that instead of 
odds against there were odds for being right in judging that the probability of drawing a blank in a single 


trial lies between 42+ d« and -ss i i essa, 
The foregoing calculations further shew us the uses and defects of the bi mé pm ier bake i 
is evident that the two last rules do not give us the required chances within such narrow limits as coul 
dered, that these limits become narrower and narrower as g is 
taken larger in respect of p; 


second rule, These two rules therefore affo ere * 
till some person shall discover a better approximation to the value of the two series in the first rule. 


But what most of all recommends the solution in this Essay is, that it is 1 those cases deed 
information is most wanted, and where Mr De Moivre’s solution of the inverse problem can give o 


no direction; I mean, in all cases where either p or q are of no . N m Ses dm "iic 
when both p and q are very considerable, ucc 


it is not difficult to perceiv J 
demonstrated, or that there is reason to believe in general that the chances for the happening of dici 
are to the onde for its failure in the same ratio with that of p tog. But we shall be greatly ier x f uve 
judge in this manner when either p or q are small. And tho’ in such cases he 77 755 = vet E 
discover the exact probability of an event, yet it is very agreeable to be a e eat 1 
which it is reasonable to think it must lie, and also to be able to determine the precise degree 
which is due to any conclusions or assertions relating to them. 
* Since this was written I have found out a method of considerably aped ha e » 
the second and third rules by demonstrating that the expression 25 /{1+2Ha ios ny ir 
almost as near to the true value wanted as there is reason to desire, only always somew! j 
necessary to hint this here; though the proof of it cannot be given. 
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THE PROPERTIES OF A STOCHASTIC MODEL 
FOR TWO COMPETING SPECIES 


By P. H. LESLIE an J. C. GOWER 


Bureau of Animal Population, Department of Zoological Field Studies, Oxford 
and Rothamsted Experimental Station 


1. INTRODUCTION 


A stochastic model for studying the properties of certain biological systems by numerical 
methods has been described in an earlier paper (Leslie, 1958), to which reference should be 
made for the full details of its development. Two varieties of this model for the case of two 
competing species have been programmed for the Elliot-N.R.D.C. 401 computer in the 
Statistical Department of Rothamsted Experimental Station, and some of the results 
obtained are given in the following paper. 


2. THE MODELS USED 


Suppose that two species Si and S, are competing together in a limited environment, and 
that their populations consist of N,(t) and Met) individuals at time t. Then it is assumed that 
the expected balance of births and deaths in the two populations during the discrete interval 
of time t to t+ is defined by the deterministic model, 


Ne-. U) - 


2700 500 = A0 Ns), 


Mt ) = 2 N40 = A40) N40, 


galt) (1-1) 


where the functions 
nit) = 1+, Nt) A, eat 
(Al, 22,21, H: > 0), 
Galt) = 1e Nr NO) mers 
and the constants log, A, = r,, log, A, = v are the intrinsic rates of increase of the two species. 
This system will have a stationary state when qı = A, and qg = Ay, or when 


M= Ii 42e )—Bi(Ag—1) 


* — fi Py 4 (12) 
NL %(Ag—1) — f -1) 
: 1A — fy By : 
If we define the ratios . Y, AA. z£, 
(A1 1) LAA 


then, in the deterministic model, the following four possibilities arise: 


x<y<l. Stable stationary state (both S, and / o persist). 
LIS i. Unstable stationary state (either S, or S, persists, depending on the initial 
state of the system). 


SI, z»y. Si persists and S, disappears from the system. 
yl, z«y. S,persists and S, disappears from the system. 


oe ———, 
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By a suitable choice of the parameters in (1-1) we thus can construct a numerical system 
which will fall into one or other of these four categories. 

An innumerable variety of different stochastic models can be imagined which will have 
the same deterministic equivalent, depending on the assumptions made as to the birth-rate 
and death-rate functions for the two populations (cf. Bartlett, 1957). But these possibilities 
can be regarded as falling between the extreme cases of either the birth-rate or the death- 
rate of each species remaining constant. In order to study the qualitative properties of this 
type of system, we may work in terms of these two limits, and we shall consider, therefore, 
the following two models. 


Model I, in which it is asswmed that the birth-rate of each species remains constant 
We have in (1:1) the constant 
A, =e = dh (a 1,2), (1:3) 
where the intrinsic rate of increase r, is the difference between a birth-rate b, and a death- 
rate d, (b, & da). Since the birth-rate of each species is assumed to remain constant, we have 
for the discrete interval of time f to t+ 1, 


log, [Aa/a(t)] = loge Aalt) = ba- delt) (a = 1,2), 


where the death-rate d,(t) is a function of N,(t) and N,(t), and is regarded as remaining 
constant during the interval. Then, from the standard theory of simple ‘birth’ and ‘death’ 
processes, it may be shown that if we adopt for our two hypothetical species values of A, 
b and d in (1-3) such as 


b d 
2-0 1-0083 5 (1-4) 
2:5 1-0308 0-1145 


then in the stochastic model, we may take 


BC+ = ANO) qs ia, n 
var [N,(t + 1)] = 22[N,(t+1)] 

where A,(f) is defined in (1-1). In order to simplify matters in practice, we assume as an 

approximation that N,(t+ 1) is distributed normally with jj, and c7 given by (1:5), attri- 

buting all negative values to N,(¢+ 1) = 0. Thus, given N, (t) and Na) at time t, Ni(t+ 1) and 

N,(t+ 1) can be calculated with the help of a pair of random normal deviates, and the pro- 

cesses can then be continued with the resulting values. 


Model LL, in which it is assumed that the death-rate of each species remains constant 
In this case we now have in (1-1) for the interval of time t to t+ 1, 
log, Duas 01 = log, Aalt) — halt) =d, (a= 1, 2), 


ecies is a function of N,(/) and Nu), and the death-rate da 


Mudo o 1 be attached to a negative birth- rate, we there- 


remains constant. Because no meaning can 
to defin d 
fore have efine Asie lalt) ( galt) « e^), 
since b,(!) = 0, when galt) = eva; and 
A(t) m e Galt) > Ce). 
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Then, provided we adopt the same values of A, b and d as given in (1-4), we have 
ELN,(t+1)] = Aalt) N (Aa > Aalt) > 679), 
e d Nat) (qalt) > e), 
(a = 1,2) (1-6) 
and var [Na(t+ 1)] AAN I) (Au > Aalt) > 6742), 
= (1— eta) N I (galt) > €^»), 


where the functions f[A,(t)] in the expression for the variance, for the two values of A, are 
given by the linear relationships, 


A=25: f[A()] = —0-87-- L IO AG), 
A=20: fA] = —0-66--1-29A(0. 


As before, we assume that given Mit) and Mut) at time t, then N,(t+1) and N,(t+ 1) are 
distributed normally with these means and variances. 


3. PROGRAMMING OF THE MODELS 


The programme was so arranged that the constants 94, G, Pis By, Ay, Az were read into the 
computer at the beginning of each run, together with a pair of random numbers and the 
initial population size for each species. It was thus possible to vary these parameters very 
easily and so study the models under different conditions. The random numbers were re- 
quired to start a process for producing pseudo-random numbers and eventually random 
normal deviates; the possibility of using existing tables of random numbers was excluded 
since about 40,000 numbers might be required for each initial point estimated. The method 
adopted was as follows: 
(i) Choose p and a at random, e.g. from a table of random numbers. 

(ii) Replace z' by x, the closest number to x’ such that «=5 (mod 8). 

(iii) Form successively the numbers px" mod (2*), n = 1,2,3, .... 

Under these conditions it can be shown (the proof resting on a theorem of Euler) that the 
numbers so formed form a repeating cycle of 2*-? different numbers. In fact the successive 
powers of z are all the 2*-? numbers (mod 2^) whose last two binary digits are 01, where for 
the Elliot 401 computer ; is taken to be 32. By choosing different values of p the sequence 
may be generated in different orders. The main advantage of this over other methods 
advocated for generating pseudo-random numbers is that it is impossible to get into a closed 
loop generating zero or the same few numbers over and over again. Tests for various types 
of departure from randomness for this method have been reported in the literature (Foster, 
1954; Taussky & Todd, 1956), and it has generally been found to be satisfactory. Con- 
sequently, only a very simple test, which may be regarded as a form of quality control, was 
incorporated in the programme. 

Random normal deviates were produced by summing twelve variates (uniform in the 
range — «z« produced by the above method. Such a deviate will have zero mean and 
unit variance. A running total was kept of all deviates of magnitude greater than two, 
together with the total number of deviates used. These deviates were not rejected since 
otherwise an undesirable bias would have been introduced into the process. 

For each pair of initial population values in the case ofa system with an unstable stationary 
state, the programme was arranged to run through sixty representations of the population 
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growth, stopping when one or other of the species had become zero. For each representation 
the number of units of time required to reach extinction and the particular species which had 
vanished were recorded in the machine. After all sixty trials the probability that species S, 
survived and its standard deviation were printed, together with the distribution of the time 
to extinction for each species and the observed and expected number of normal deviates 
used of magnitude greater than two. (These ‘normal’ deviates are in fact the sum of twelve 
uniform variates, as explained above, so that the appropriate percentage point is 0-04455 as 
opposed to 0-04550 for the normal distribution (Hall, 1927).) 

The programmes for the two models differed only in the evaluation of the expression for 
var [.N (t A 1)], so that the modifications required were almost trivial. 


4. THE PROPERTIES OF A SYSTEM WITH AN UNSTABLE STATIONARY STATE 


When the deterministic model has an unstable stationary state, the final outcome of the 
interaction depends on the initial state of the system. Consequently, in the stochastic model, 
it might be expected that random variations, more particularly in the early stages of 
population growth, could be an important factor in deciding which of the two species would 
survive (Bartlett, 1957). 
In order to investigate this point, the numerical values of the parameters in (1-1) were 

taken as 

u = 2:5: qı = 14-0-0030 N, 4- 0-0105 N,, 

Ap = 2-0: qa = 140-0025N, + 0-00 0 , 


which, from (1-2), give an unstable stationary state with L, — 150 and L, — 100. 

Suppose, for example, that in the deterministic model we take N,(0) = N40) z 20, then 
by a repeated application ofequations (1-1) it appears that given these initial conditions, the 
species S, will always survive and S, will disappear from the system. (Given other initial 
conditions, of course, this would not necessarily be the case.) But in both types of stochastic 
model, I and II, it was found that with these same initial conditions, the processes sometimes 
went in one direction and sometimes in the other. A few typical (N,, Na) trajectories for 
model I are given in Fig. 1. It was clear from these preliminary results that at any point on 
the (M, Ne) plane there would be a probability, 0 p < 1, that ultimately the species 81 
would survive and 5, disappear from the system. It was of interest, therefore, to determine 
the contour lines of p for the two extreme types of stochastie model. 

We give in Table 1 the estimated values of p, in the case of model I, fora number of points 
on the plane, each of these estimates being based on 60 replicates. These points rate re- 
garded either as the initial conditions att = O of some particular system, or as the con 5 
at time t of a system which started at some previous time ( — a. By a suitable 5 0 in the 
origin of the time scale for the latter, the two cases become equivalent. The ee buena 
of the distribution of p can be seen from the entries in this table. For a fixed value vi At ^ 
p steadily increases with increasing Mio) until it becomes 100 76. 8 S trajec ue 5 
a particular replicate were to reach the region where p ~ I this means t at ee i 
almost certain to survive, and that the chance of any reversal in the trend is neg d ly n : 
Conversely, a trajectory reaching the region where p ~ 0 means that the species S; is a/mos 


certain to survive. A ; 
In order to map the contour lines of p, we made use of ^ apum 10 e 
each value of N) in Table 1, the probit of p was linearly related to log (0). P 


- 
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lines were calculated in the usual way, and in each case the x? goodness of fit test was satis- 
factory. The only feature of these lines which should be mentioned is that the slopes steadily 
increased with increasing N,(0); in other words, a probit plane could not be fitted to the 
results given in Table 1. The contour lines for p = 95, 50 and 5 % which were calculated from 
these regressions of probit p on log N,(0) are given in Fig. 2, the irregularities in the figure 
being due to the sampling errors involved in the estimates. The spread of the contour lines is 
fan-shaped, and in this numerical system the 50 % line passes through the unstable equili- 
brium point, Li = 150, L, = 100. (The estimated 50 % point for N,(0) = 100 was N,(0) = 155, 
with a fiducial range (P = 0-95) of 149-162.) However, there is no reason to think that this 
would necessarily be true for all systems with an unstable stationary state. 


300 


N3 


200 


100 


100 200 300 400 500 
Ny 
Fig. 1. Selected complete trajectories of the population size model I (the initial population in each 
case is VI N. = 20). 


The results of a similar series of calculations using model IT are given in Table 2. For 
relatively small values of the initial numbers the probability p appears to be much the same 
in the two models (cf. the estimates for N,(0) = 20 and variable N,(0) in Tables 1 and 2); but 
in model II, as N,(0) increased in magnitude, the slopes of the probit lines became steadily 
greater than those for model I, leading to a narrower band of probabilities lying between 
p^ 0 and p~ 1. This is shown in the comparable graph of the 95, 50 and 5 0% contour lines 


of p, given in Fig. 3. This smaller spread is presumably due to the smaller variance of this 
model.* 


* That the smaller spread of the contour lines in Fig. 3, compared with that in Fig. 2, is due to the 
smaller variance of model II, was confirmed accidentally by a set of calculations using model I in 
which, through an error in programming, it was assumed that var [N,(t + 1)] for each species was equal 
to 0:5 E [N,(t+1)], instead of the correct approximation var [Nat 1)] - 2 [N,(t4- 1). Exactly the 
same pattern of contour lines emerged as in Fig. 2, but the fan-shaped spread was very much less. 
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From the fitting of the probit regression lines for each fixed value of N,(0) in Tables 1 and 2, 
the estimated 50 % points for N,(0) were the same in the two models, apart from errors of 
random sampling, up to N,(0) = 125; but for N,(0) = 150 and 175 they were significantly 
less in model II than in model I, leading to the curvature of the contour lines which can be 
seen in Fig. 3. The explanation of this difference between the two models appears to be that 


Table 1. Estimates of the probability p (%) that the species S, will survive, 
for varying N,(0) and N,(0), in the case of model I 


N;(0) 
N,(0) 


20 50 75 100 125 150 75 
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Each estimate of p is based on 60 replicates. 


in this particular numerical system the processes may be involved with the discontinuities in 
model II, which are due to the restriction that the birth-rate of each species becomes zero 


when q, = ea (a = 1,2). When qa < cba the expected numbers Na 1)] are the same in both 
É then the expectations are different. 


models for given Mei) and Nat); but if this is not the case, 
To take as an ache a typical point in this region of the plane, suppose that N,(0) = 225 


and N,(0) = 175. Then, from (1-1) and (1-4) we have 
qı = 35125 N 6 = 2-8033, 


da = 2°5625 < es = 277409. 
cted numbers att = 1 are H[N,(1)] = 201, E[N,(1)] = 137 


Hence from (1:5) and (1:6) the expe d ) à 
in the case of model IT, and E[.N,(1)] = 160, B{N,(1)] = 137 in model I. Thus, the trajectories 
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Table 2. Estimates of the probability p () that the species S, will survive, 
for varying N,(0) and N,(0), in the case of model II 


N,(0) 


20 50 75 100 125 150 


* Estimate based on 120 replicates. 
T Estimate based on 180 replicates. 
All other estimates based on 60 replicates. 


5% 50% 95% 


20 40 60 80 100120140160 180 200 220240 260 280 300 320 340 360 380 i 
N; 

Fig. 2. Contour lines for percentage probability that S, survives and S, disappears, model I. 
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for the two models start off by following, on the average, different paths, and the divergence 
between them becomes greater with increasing time. From the direction of the difference 
between the mean paths for given N, (e.g. E[N,(1)] for II > BLN,(1)] for I | E[N(1)] = 137), 
we should expect, therefore, that the probability p associated with these initial conditions 
would be greater in the case of model II than in model I. Since the majority of the points 
tested for fixed N,(0) = 150 and 175 fell in the same region where g, > e^, q < e^, there would 
tend to be a greater probability of the species S, surviving in the constant death-rate type of 
model for such initial conditions of this numerical system. 


20 40 60 80 100120140160 180200 220 240 260 280 
Ny 
Fig. 3. Contour lines for percentage probability that Si survives and S, disappears, model II. 


The difference between the two models for these rather extreme initial states of the system 
in relation to the unstable stationary point is possibly of less interest, however, than the 
agreement between them for the remaining points tested. It can be seen from Figs. 2 and 3 
that, broadly speaking, the general pattern of the contour lines of p in the remaining regions 
of the plane is very similar for the two models, apart from the degree of spread, and we can 
infer that the results for all the other possible types of stochastic model which have this same 
deterministic equivalent would fall somewhere in between the results for these two limiting 
cases. S 

5. RESULTS FOR A SYSTEM WITH A STABLE STATIONARY STATE 


As a contrast to this type of system, suppose we take the case of a stable stationary state 
with Li = 150 and L, = 100, for example if in (1-1) we have 

Ay = 2:5: gy = 1+ 000800 M + 0-00300N,, 

Ap = 2:0: qe = 1+0-00625N, + 0.00 250 N. 


Two realizations were calculated for this system using model I, because of the saving of 
machine time in the case of this model, and also because of its greater variance. Taking the 


initial state of the system as N,(0) = M(0) = 20, the values of N,(!) and Nat) were printed off 
at each step in the calculations, the first realization being computed up to t = 149, and the 


second up to t = 70. : j 
These processes rapidly approached their equilibrium levels, in the region of which they 


then continued to fluctuate in an irregular fashion. As an illustration, the results for the 


second realization are given in Fig. 4, neglecting the first six steps in the calculation during 
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70 


Fig. 4. Fluctuations around the stable stationary state in the case of the second replicate. 


species S, (equilibrium level VI 150); 


N,=100). 


Table 3. Frequency distribution of the observed N, and N, when the processes 
were in the region of the stable stationary state 


„species Są (equilibrium level 


N, 
50- | 60- | 70- | 80- | 90- | 100- | 110- | 120- | 130- | 140- 
e -— — = 2 = 
EL mesh EN can raae 3 4 1 1 
2 1 1 2 4 2 2|— 
— | oe 1 4 4 4 4 2 — 
1 1 1 8 2 3 5 3 2 3 
1 1 2 6 SM 6 3 1|— 
4 Cd hes 4 4 5 3 3 1 — 
1 4 3 6 4 3 5 4 1| — 
8 2 Cael aces 6 3 2|—|—|-— 
ALLEN, Mi P MER 1|—|—|-— 
1 r Ss x — — — 
Is 1 2 1 Pet, | ae | — | — 
— — — — 1 — — — — = 
13 | 15 | 10 | 28 | 33 | 33 | 33 | 23 | 10 4 
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which time an approach was being made to the steady state, Li = 150, L = 100. It will be 
seen that although these processes fluctuate around the latter, there is at times a tendency 
in both cases for a drift to oceur away from this region. Thus, in Fig. 4, the numbers of the 
second species, after fluctuating somewhat above the equilibrium level from t = 20 until 
about t = 35, then started a slow drift towards the base-line, but later recovered so that at 
the time the calculations were stopped, the numbers of this species had again returned to the 
region of the steady state. During this time the numbers of the first species tended to drift in 
the opposite direction. The same phenomenon, in a varying degree, was also apparent in 
the results for the first realization, and most probably it is associated with the negative 
correlation which exists between the numbers of the two species. 

We give in Table 3 the observed bivariate distribution of N, and N, for the combined data 
for the two realizations in the region of the stationary state. From this table we have 
Ñ, = 148-3 and Ñ, = 97:2; while var () = 553-8, var(N,) = 570-3 and cov M, Nj) = — 243-4. 
The type of distribution seems to be approximately normal in form. Thus, if we calculate 
the expected marginal distributions from the given estimates of the means and variances, 
we have: 


For the marginal distribution of N., we have y? = 7-3 for 7 df. à perfectly reasonable 
2 = 15-1 (7 d.£.), a somewhat excessive 


value to obtain; while for the distribution of M. X 
value which is due very largely to the deficiency of the observed values in the N, = 70— 79 
class. 

It is of interest to compare the 
stationary state with those expected on the theory of s 
model used here." This model may be written as 


Nt 1) = f M. t) +Zalt+ T) (a= 1,2), 


variances of these observed fluctuations in the region of the 
mall fluctuations for the discrete time 


* We are indebted to Prof. M. S. Bartlett for pointing out to us the following method of deter- 
for the discrete time model. 


mining the theoretical variances and covariance 


21 


Biom. 45 
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where the first term is the deterministic part of the process given by (1-1), and the 
Z,(t+1),(a@ = 1,2), are independent normal variables with zero means and variances, 
in the case when it is assumed that the birth-rate of each species remains constant, 

o (Za) ~ 2H[N,(t+1)]. In the region of the stationary state 


ON (t 1) = N,(t+1)-L,, 
and o = E{(f,—L,)*}+0%(Z,), 
cov (M, Nj) = cov = E{(f,—L,) (fo—L.)}. 


Hence we have the set of equations for determining the variances and covariance of N, and 


N, when the fluctuations are regarded as small, | 
9f * afi) ( efi of, \* 
2_ (41) o 3 22702 
1 -( etes) (sx) eov (2) 02 ＋ J), 


02 = (& 124% (2 cov + (St) oto. 


x“ 28) (28) 27 1) (Fe). (9f 24] x) (al 2 | 
v7 (ix) GS) E) GR) GR GR] (AJ (Ee 
which are to be evaluated for N, = L,. Thus, to take the first species as an example, we have 
from 
ARNAN 
(obra NE AUN 


fi 


($) = 1-94 x) VI 
OM) rr, Aj oN, 


and similar expressions in terms of Az, %2, II and I from the function f, for the second species. 
Given the numerical values of the parameters in the present example, the equations are 


ERE Ay 


L 
0-729601 + 0-1872 cov — 0-032403 = 300, 
— 0-01562561 + 0-171875 cov + 0-5273437503 = 200, 
0:0650% + 0-62 cov + 0-1237503 = 0, 


whence oj = 465-5, o3 = 437-4 and cov = — 136-1. These expected variances and co- 
variance are less than those actually observed, viz. var (M) = 553-8, var () = 570-3 and 
cov NM, Na) = — 243-4; but, apart from the question of the sampling errors associated with 
the observed values, the agreement seems to be reasonable when we consider that the 
expected values are based on the theory of small fluctuations, and so cannot be exactly 
correct. 

It is perhaps worth noting that if we were to derive the theoretical variances from the type 

of continuous time model suggested by Bartlett (1955, 1957; cf. also Whittle, 1957) for 
studying the properties of such stochastic processes in the region of the stationary state, the 
discrepancy between expected and observed would become much greater. Thus, for a small | 


interval of time df, the equivalent continuous time model for two competing species may be 
written 
d | 


ôN, = B (51) 
N, = (N- a4 N3 — ka, N, N,) d+ 6Y,—8Z,, 


| 
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where d V., 6Z,(i = 1, 2) are independent, modified, Poisson variables with zero means, and 
variances, in the case when it is assumed that the birth-rate of each species remains constant, 

var(3Y) = b,N,dt, (i = 1,2) 

var (2Z,) = (d, N +a; N} + kra; N, Ng) ot. 


In the region of the stationary state suppose that N = L,(1+1,) (i = 1,2), then for small 
u we have from (5:1) the linear stochastic system, 


du, = Bobs nidos 
du = — Ag( Ly tty + ky Ly uy) 0t +89, 


where, for the constant birth-rate type of model, q and à have variances (2b,/Z,) dt and 
(202 / La) ôt, respectively. Forming the equations u, d and t+ du, from (5-2), we obtain 
by squaring, taking the cross-product and averaging the following expressions for the 
variances and covariance (c, o and ci) of u, and ug. If we write b;/(a, L) = X, and 
5 / (u Lz) = Xa, then 


(5:2) 


Boj = X,- HLN 


Dos = Xg- KI (5:3) 
while the covariance is given by 


(Key E, — 1) (a4 Ly + ag La) = ask, X4 T a, Xs. 


The values of Z4 and L are the same in both types of model, while the relationship between 
the remaining parameters, in terms of those for the discrete time model, is given by 
a; = a, log, A;(A; — 1) and k; = Hifi = 1,2). Since in this particular numerical system 
b,~b,~1, we have from (5-3), expressing these variances and covariance in the form 
var (N,) = L2 var (u), o$ = 242-1, o = 270-7 and cov = — 99:8. Thus it is evident that the 
theoretical variances of small fluctuations about the steady state are smaller for the con- 


tinuous time model than for the discrete case.* “se i 
It appears, then, that in a system of two competing species with a stable stationary state, 
the number of individuals over a relatively long period of time settles down to a type of 


distribution which is approximately normal in form, but with a degree of variation which 


Ws " ter 
b ter than that cted for small deviations about the stable state. This grea 
dope uk ee ees lead to an increased chance of 


degree of variation about the equilibrium level can only 
random extinction of one or other of the two species. 

No calculations were carried out for this system using m 
because of the smaller variance of this model, there would have been a 
variation about the steady state (cf. the results for a logistic process usin; 


* This point also arises in regard to some comparisons which have been made (Leslie, 1 
the theoretical and observed variances of a number of logistic e se ae 9 0 »- dob 
the upper asymptote. A number of replicates were peii SOR With se ie 
These variances were somewhat greater than those expected from the e E: isi 
appears now that a better agreement would have been ob ed if the din a Ke Nabe n tie eae eae 
on the ‘Total’ sums of squares and the theoretical variances had Lesa A NE e 
model. For a logistic process these theoretical variances are given by € = ee e ci 
birth-rate remains constant, and by cn 24KA*/(A* — 1) when the death-rate dt Fs 1 12 
in each case K is the upper asymptote in numbers. However, ad EH di e fas Rind 
(Leslie, 1958, $ 8) would not affect the principal conclusions, namely 


studied, the chance of extinction was negligible for any given time interval. 


odel II, but it is quite clear that 
smaller degree of 
g these two types 


21-2 
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of stochastic model (Leslie, 1958, $8)). Other things being equal, therefore, the chances of 
random extinction would be less for the second type of model, in which it is assumed that the 
death-rate of each species remains constant. 


6. WHEN ONE SPECIES ALWAYS PERSISTS 


The question remains as to the effect of random fluctuations in the other possibilities which 
arise in the deterministic model. Thus, if in (1-2) we put 


By(A,—1) =y Bibs = 


TA "' ama 


> 


then when y € 1,2» y, the species S, will always persist and S, will disappear from the 
system. When y 1, z y, we have the case of the unstable stationary state, and in the 
deterministic model there is a sharp demarcation between these possibilities when y = 1. In 
the stochastic model, however, these demarcations must be interpreted more liberally. For 
instance, if we take the parameters of the system to be 


Ay = 25: qı = 1--0-00300N, + 0.00375 M, 
Az = 2-0: qa = 1--0-00250N, + 0-00500N,, 


then y = 1, x = 2:5, and there is an unstable stationary state with L, = 0, L, = 400. In the 
case of the deterministic model, the species S, would always persist, whatever the initial 
conditions of the system; but, in the stochastic model, although the probability of the 
species S, persisting will be p ~ 1 over most of the (NM, N) plane, there is still a region where 
there is a non-zero probability q = 1—p that the outcome of the interaction will be reversed. 
For instance, the following were the estimates of P(%), based in each case on 60 replicates, 
for the stated values of N,(0) and NI 0) in a system with the above parameters, using model I. 


Values of p (%) 


200 350 400 
95-00 81-67 60-00 
98-33 93-33 90-00 

100-00 100-00 100-00 


These results for the borderline case suggest that by making progressive changes in the 
assumed values of the parameters, it would not be difficult to arrive at a system with y <1, 
>y, in which the properties of the deterministic model would be changed in no way by 
random variations, Similarly, for the case of y 1, x< y, Which in the deterministic model 
means that the species Sz will always persist. 


7. CONCLUSIONS 
We may conclude from the results of these experiments that the most important difference 
between the properties of this stochastic model and its deterministic equivalent is in the case 
of a system with an unstable stationary state. In the deterministic model this implies that 


G 
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the outcome will depend on the initial state of the system, and that for any given state the 
particular outcome is then certain to oceur. In the stochastic model, however, there is 
associated with the state of this type of system at any time f, a probability p that ultimately 
one of the species will survive and the other become extinct, and a probability q = 1 — p that 
the outcome of the interaction will be reversed. 

This feature of the stochastic model is qualitatively very similar to the phenomenon 
observed by Park (1954) in competing populations of the flour beetles, T'ribolium castaneum 
and T. confusum. The initial conditions adopted in his original experiments (23, 29 adults of 
each species) were the same for all replicates, and batches of replicates were observed under 
six different combinations of temperature and relative humidity. In four of his physical 
treatments, T. castaneum survived in a certain proportion p of the replicates, and T. con- 
fusum survived in the remaining q — 1— p, the value of p varying according to treatment 
(Park, 1954, table 12, treatments II-V). By plotting the trajectories of the individual 
replicates for these four treatments, Neyman, Park & Scott (1956) were able, in each case, 
to divide the (M, N)) plane empirically into three zones. The two outer of these were 
‘determinate’, since it appeared that if the trajectory of a replicate reached one or other of 
these zones, only one consequence was then possible, while in between there was an 
‘indeterminate’ zone in which the process might still go either in one direction or the other, 
though not with an equal probability. Their figures for these zones (Neyman et al. p. 58) 
show a very similar type of fan-shaped pattern to the graphs of the contour lines of p, which 
we have given here in Figs. 2 and 3. A further noteworthy point is the relative consistency 
of the observed values of p in different experiments with these species, when the populations 
were initiated with the same number of individuals and kept under similar physical condi- 
tions (Park & Lloyd, 1955, table 1). In a more recent paper, Park (1957) has examined the 
relation of the initial numbers to the competitive outcome in populations of these species 
kept at 34? C., 70% R.H., conditions under which T. castaneum always won and T'. confusum 
was eliminated in his original series of experiments. The results were that out of five different 
combinations of initial numbers (in terms of eggs) at f = 0, T. castaneum still won in every 
replicate for four of the combinations; but in the remaining one, the outcome was reversed 
in some cases, T. confusum winning in five out of the fourteen replicates observed. Thus, the 
same phenomenon was realized experimentally when the initial conditions of the system 
were varied. : L 

Clearly, there is a close analogy between the results observed in these experimental 
populations and the qualitative properties of this model for two competing species, when the 
stationary state is unstable. But this analogy cannot be taken as direct evidence that the 


phenomena observed by Park were due to the existence of this type of stationary state in his 
rded as suggestive. In order to decide whether or 


competitive systems: it can only be rega: ! 
not this was the case, some quantitative comparisons between the theoretical model and the 


observed data would be necessary, and for this the type of model used here is of much too 
simplea form. In its development we have neglected the effect of a changing age distribution 
on population growth, and factors such as the mutual cannibalism of eggs and pupae by 
certain age groups, which are known to occur in the case of Tribolium and which necessarily 
must have an important effect on the growth in numbers and the interaction between these 
two species. Nevertheless, the results for this simple stochastic model indicate some of the 
possibilities which may also arise in the case of the more complex models for this type of 


interaction. 
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A PROBLEM IN THE COMBINATION OF ACCIDENT FREQUENCIES 


By J. C. TANNER 
Road Research Laboratory, Department of Scientific and Industrial Research 


The paper is concerned with the analysis of road accident frequencies before and after similar changes 
in road conditions are made at each of a number of sites, At any one site it is assumed that the total 
number of accidents recorded is binomially distributed between the before period and the after period, 
the parameter of the distribution depending on the relative lengths of the periods as well as on the 
effect of the change. A method of estimating the average effect of the changes is proposed. It is 
shown how the accuracy of this estimate depends not only on the chance variations arising from the 
smallness of the accident frequencies but also on any real differences that may exist between the 
effects of the changes at different places. The methods proposed are illustrated by a numerical example. 


1. THE PROBLEM 


To estimate the effect of a change in road conditions at a particular site on the frequency of 
accidents there, the usual procedure is to obtain details of accidents at the site in convenient 
periods before and after and to compare the ratio after to before with the corresponding ratio 
fora large control area. The latter may be the whole of the police district in which the site lies, 
or some other area from which trends due to external factors can be reliably assessed. The 
significance of the difference between the two ratios can be tested in the usual way by means 
of y? with 1 degree of freedom (see $3 below). 

Frequently, however, one wishes to combine the data from a representative sample of 
changes of a given type, since the frequencies for any single change are usually too small to 
enable useful conclusions to be drawn. This raises three problems. In the first place, unless 
all the before periods, and also all the after periods, are of the same length (or more generally, 
if the ‘control ratios’ after to before are the same at all sites), it is not immediately obvious 
how the average effect of the type of change concerned should be estimated. Secondly, it is 
desirable to test whether the effect of a given type of change is the same at all sites. Thirdly, 
if there is reason to suppose that it varies then complications arise in testing the significance 


of the average change. 


Expanding on the third point, one of two null hypotheses can be tested: either 


that over the set of changes actually studied there was no effect on the total expected 
frequency of accidents, or that over the population of changes of which those actually 
studied form a sample, there was, on the average, no change in the frequency of accidents. 
The latter test is on the whole more useful, and is the only one dealt with in detail in this 


paper. 3 ihe 
This paper discusses these matters and suggests appropriate methods of estimation and 
lied extensively to practical prob- 


significance tests. These methods have already been app 
lems, a general survey of which has been given by Garwood & Tanner (1956). 


2. NOTATION 


N Number of sites from which data are to be combined. 
b; Number of accidents in the before period at site iG = 1,2,...,N). 
a, Number of accidents in the after period at site i (i = 1, 9 ite ND: 
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C;. Ratio of accidents after to before in the control area for site i (assumed free from error). 
n, = a; b. ‘ 
k; = a;|(b;C;). This measures the apparent effect of the change at site i. It is the ratio of 

accidents after to the number that would have been expected if the change had no effect. Jj 
It is assumed throughout that b; and a; are drawn from a binomial distribution: 


1 KC, "i 
( KC; vA Tea) t 

in which n; is regarded as fixed. x; is the ‘true’ value of k,, i.e. the value that k; would take 
if b; and a; took their expected values. From some points of view it would have been pre- 
ferable to assume that b; and a; followed independent Poisson distributions with means f; 
and &, G. The extra parameters //; would, however, have complicated the analysis, and the 
results would have been similar, or in some cases identical. 

Throughout this paper, the summation sign X denotes summation over sites, from i = 1to 
i = N. These limits are omitted to save space. In part of the Appendix, the suffixes i are 
omitted. All logarithms used in the paper are to base e. v a 


3. ANALYSIS FOR A SINGLE SITE 
When data from only one site are available, there is little choice of method. The obvious ^ | 
procedure is to use k = a/bC as an estimate of x. A value of k greater than unity denotes an 
increase compared with the control area, while a value less than unity denotes a decrease. 
To test the significance of the change, one can calculate x? with one degree of freedom in 


the usual way, as follows: 
(20) 1800 
* = . zi 8 


n nC 
1+C 170 
- 2 á 
G | 0) 


Tt should be noted that this tests whether there was an effect due to the particular change at | 
the particular site, not whether the type of change of which this was a representative is 
effective. . 
4, ANALYSIS WHEN REAL EFFECTS AT ALL SITES MAY BE ASSUMED EQUAL 1 

We consider here the method of estimation and the test of significance to be used when prior 
considerations or internal evidence in the data suggest that there is no variation between the 
real effects x; at the various sites. The determination from the data of whether the K; Vary | 
is discussed in section (5). + 

We use the method of maximum likelihood. No special attempt will be made to justify its 
use in the present context, except to say that it appears to give a sensible answer and that no 
more appropriate method is known to the author. A 

The first problem to be dealt with is that of estimating x, the common value of k,, Kg «++» KN* _ 

The probability of the sample values 41, ..., d, b, ..., by is 


n (Oc : 1 
i gie 


| 


* EN o c c^  — d — n 
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The log-likelihood is therefore 


L= = [log 0% a, log k +a; log C, — n, log (1 . 


Thus at a | 
ox Kk 1+KC, 
Equating to zero, k, the estimate of x, is given by 
> à — kb,C; q 
= ITU 0. (2) 


To solve this equation it is most conveniently expressed as 


Tj 


Lir = 2 b,. (3) 


"The left-hand side of this equation may then be calculated for suitable trial values of k until 


a sufficiently accurate approximation to the solution is obtained. To find the sampling 


variance of k, we have 


9 ki x6 
Putting E(a;) — n. we obtain 
OU od Cn 
CU Ko 
Hence vark — -—XAm = Sex : (4) 
(a) (o «C 


The discussion in the Appendix shows that the sampling distribution of log k is rather less 
skew, and more nearly normal, than that of k itself. Thus confidence limits and significance 
tests would be better based on the assumption of a normal distribution of log k. (By a general 
property of maximum likelihood estimates, log k is the M.L. estimate of log x.) 

'The asymptotic sampling variance of log kis 


c Ee TE AR 5 
var aie r (5) 
(sees (1 +x0;)}? 


This T be estimated by putting the sample value k in place of x, or, when testing 


whether differs significantly from unity, by putting « = 1. ^ [ 
For practical purposes, the formula for var log k may be further simplified. The function 


z[(1 4-2)? varies with « as follows: 


* 
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Thus if each xC;/(14-kC;)? in expression (5) is replaced by 0-25, var log k becomes 


var log k = —. (6) 


Usually, most values of C; are fairly close to unity, say between 0-5 and 2-0. Even if all the 
C; were 0-5 or 2-0, then the standard error would only be under-estimated by a factor 
(0-222/0-250) = 0-94. In practice the under- estimation would normally be much less than 
this. 

These formulae for var log k are, according to the theory of the method of maximum 
likelihood, only approximations valid for large sample size N. For moderate sample sizes, the 
Appendix suggests that better estimates are obtained by multiplying them by 


142/En,. (7) 


Thus, for example, if En; = 40, the variance from the simpler formula is 5 0% too low. For 
larger values of Ant, it is probably not worth the trouble of applying this correction. 


5. TEST FOR EQUALITY OF THE Kj 


If all the x; were in fact equal to a common x, then the variation between the sample 
estimates k; = a;/(b; C;) would arise solely from the binomial distributions to which b; and a; 
are subject. This suggests that it may be possible to use a y? test for variations between the Ki. 
Consider the statistic: 
(a; — kb, Ci)? 
li. i 
* 22 kCm, ^" 


where & is the maximum likelihood estimate given by equation (2). The corresponding 
expression with & instead of k is distributed as x? with N degrees of freedom; replacing x by 
the efficient estimate k reduces the degrees of freedom to N — 1. 

If x? calculated from equation (8) is significant at an appropriate level, then it may be 
concluded that there are differences between the x;. 

This x? can be used to provide an approximate measure of the variability of the x;. This is 
required later, in § 6. Consider the function X*(u) obtained by replacing k by win equation (8). 
Then in repeated sampling of the h and a; from the binomial distributions corresponding to 
their own x;, the expected value of y?(u) is 


C(k;—u) K ITG 
E(y2 = NOK; ir at i 
(x?(u)) TITEN 25 (ag) 


Putting k;—u = x; and expanding in powers of 24, 


(8) 


E(x) = Na4z*0-"6) y HOM - 2-- uC;) 


W1+u0,) ^ wlcruQ) 
Suppose now that the K; are drawn at random from a population with mean x and variance 
var x, and that x; and n, are statistically independent. Suppose also that the sample size N 
is large enough for the difference between the estimate k and this x to be negligible. Then the 
expected value of x, taken over the distribution of the x,, is 


EO = EG = N tvare nA D, (9) 
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Since E(x?) = N — 1 when var x, = 0, and equation (9) is at best only correct to O(N), it is 
probably better to replace N by N — 1 on theright-hand side. An expression for var x; is then 
given by = 
— (10) 
X «n, — 2 +C) 


K(1-4- xC,)* 


6. ANALYSIS WHEN THE REAL EFFECTS AT ALL SITES CANNOT BE ASSUMED EQUAL 


If it is required to estimate the average effect for the particular changes studied, rather than 
the average effect in the population of changes of which they form a sample, then the pro- 
cedure of $4 is probably satisfactory. This situation, however, is not often likely to be of 
practical importance and will not be considered further. 

Suppose now that in the population of changes of which those studied form a sample, x, 
follows some distribution, with mean «x and variance var x,. The most satisfactory procedure 
would be to assume a convenient functional form for the distribution and estimate its 
parameters by maximum likelihood. However, all plausible functions that were tried led to 
intractable mathematics, and so a simpler approximate method has been used to obtain an 
estimate k of the average effect & and its sampling variance. 

We shall continue to use the equation 

a; — kb;C; 2 
2 LH kG, = 0, (2) 
as in $4 when the «; did not vary. It is intuitively clear that this provides a reasonably 
satisfactory estimate at least when the variations between the x; are fairly small. Then, 
provided the error of k as an estimate of x is sufficiently small, we may write 
K 
var log æ = aan 2 : (11) 
C log 2) AF 
where S(x) = pan a 
This assumes that, over a sufficient range of v, S(x) is linearly related to z. Var S(x) must be 
interpreted to include not only chance variations in S(x) arising from the binomial distribu- 
tions of the b; and a;, but also variations due to the distribution of the «;. 
First, we have 25e) . €^. (12) 
(; log) .. (Ii) 


We now derive an expression for var S(«). Since 


RECEN 
Se) = ypg BP 


the first term of which is fixed, we have 


var S(k) = var (£b;) = E var (b;). (13) 

Now varb; = (57) - (E b (14) 
8 15 

and Et = &(i a) = nib a): (15) 
(16) 


1 
n, K;C, ni * 1 ) -DE ( ) 
zo - n (rr asy) Presa) t Penh 
where the symbol E; denotes the expectation over the distribution of . 
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To evaluate these expectations Er, put K; = K + where x is the mean of the distribution 
of x; and the x; are assumed to be small. Then 


b o£ 3 (ne, 11 
I TTG 14+«0," (»42x0j ^7) 
JO NE, (1-2 HC, ig, AA +... 

GT (1+0) 1+0; (Ic 


Th N (1+ tm var...) 
= tea) -d praece 


1 1 ei ) 
E; (a FET = ( 0 ua Tap tet EU 
Substitution in the expressions (15) and (16) for E(b;) and H) and then in expression (14) 
for var (b;) gives, on simplification, 


KC; nf KC; —2) var x, 


;=> - 17 
var b; G TENTI (17) 
Thus, finally, from equations (11), (12), (13) and (17), 
NN ( 4- KC; — 2) 
ü ER ee C X KG) 
var log k = - ^ 
" KON; ) 
( A 
Substituting for var K; its estimate (10) gives 
NR 18) 
var log k = R ( 
(1 +xC;)? 
where Cu: C- 2) n, 
a i 441 
As pu K(L+KC,)? ` (1+KC;)? (19) 


galmu FRC, —2) y KO, 
UID (Ih 
This formula for var log k is the same as w. 
site to site, except for the factor 14 G. 


The factor 1+¢ may for practical purposes be considerably simplified. In the term 
f; - KC; — 2, kC, — 2 is small compared with n,. If it is omitted, ý becomes 


à Nef 
(Ex? i 
where KC;n; 


. 


Ah 

Now it has already been noted that the ratio xC,/(1 + KC;)* varies only slightly over the range 
of KC; of greatest importance, and so ¢ simplifies further to 

j= 00 - N +1) Eng 

(En? 1 
Now this expression for ¢ is only correct to 0(1), and so it is permissible to multiply by an 

extra factor N/(N — 1); this gives: 

"T ( * i) N Ini (20) 


N-1 J(Znj: 


as obtained when x was assumed not to vary from 


— — 
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This seems to be a better expression when all the n, are the same; for in this case 1+¢ 
reduces to 


14$ VE. (21) 


instead of to (y? 4- 1)/N. The former expression is the ratio of a measure of variability of the 
k; to the corresponding quantity when the x, do not vary, and it is quite plausible that the 
sampling variance of k should increase in the same proportion. 

It is of interest to note that a factor similar to expression (21) is proposed by Finney (1952) 
for use in probit analysis. Finney calls it a ‘heterogeneity factor’ and advocates its use to 
increase the estimated sampling variance of the median effective dose when the departure 
of sample frequencies from their expected values, as measured by &=, is greatly in excess of 
expectation. 

Adopting expression (20), we haye thus obtained for the final estimate of the sampling 


variance of log k: 
T ( YS. 1) N Xn} 


N-1 n. 
log k = — , 22 
var log oy : (22) 
(14- kCj)* 
or approximately ES ( PR 55 =| 23 
x. | * i) ol ixi 


It is suggested that the factor 1 +ø should be used only when x? is significant or nearly so, 
say when P « 0-20. For larger probabilities, the random error in ¢ may outweigh the advan- 
tage of having an unbiased estimate of variance. 

Expressions (22) and (23) may, if necessary, be multiplied by the correction factor (7). 


7. SUMMARY OF FORMULAE 


Tn all cases h, the estimate of &, or of the average of the K;, 


iind PAN (3) 


The sampling variance of log k is approximately 
2 
ise (i s.) 
A za (24) 


KON; f 
(1+x0;)? 
NEAL it 
-G Ish 

(a; — kb; C;* (8) 
2X ———. 

s km, i 

hen the standard error is required, but by unity 


is the solution of the equation 


var log k = 


where 


may be replaced by the sample estimate kw 
when it is required to test whether x = J; 
In appropriate cases, the formula may l f 
(i) Omit 1+¢ when there is no firm evidence that x; varies from site to site (P > 0:20). 
(ii) Omit 1--2/En, if Zn, is reasonably large, say greater than 40. i 
(iii) Replace the denominator by 4 n, if most of the values of xC; are in the range $ to 2. 


be simplified as follows: 
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8. NUMERICAL EXAMPLE 


The data concern the effects of installing approximately circular roundabouts at cross-roads, 
There are seven sites, and the data are as follows: 


O D N 
t 
I 
E 


49 21 — 


Thus for site 1, there was 1 accident in the before period, 6 in the after period, while in the 
whole of the area concerned, there were 4% more casualties in the after period than in the 
before period. 


The estimate k of the average effect of the changes is given by equation (3). To solve this, 


a trial value k = 0-3 was inserted in the left-hand side; this gave 48-16, which is slightly too 

low. A smaller value of k, 0-2, was then tried; this gave 53-67, which is too high. I therefore 

lies between 0-2 and 0-3 and by linear interpolation it is found to be k = 0-28, which gives 
j log = 127. 

X” is then calculated from equation (8) and found to be 25-8, with 6 degrees of freedom. 
This is highly significant, which means first that the effect of providing a roundabout is not 
the same at all places, and secondly that it is necessary to introduce the factor 1 4- in the 
estimate of the variance of log k if one wishes to draw conclusions about the effect, of 
this type of change in general, rather than in the seven specific applications of it being 
studied. 

To test the significance of the departure of k from unity, 1 + is found from expression (20) 
to be 5:281. The sampling variance of log kis given by expression (24); simplification (ii) may 
be used, but not (i) or (iii). The denominator is caleulated with x — 1 for the purpose of the 
test of significance. We find 


so that S. . (log k) = 0:565, and? = — 1-27/0-565 = — 2-25. This value of t is significant at the 
5% level and it may therefore be concluded that the provision of roundabouts of the sort 
studied on the whole tends to reduce accident frequencies. 

Since it has been established that « is significantly less than unity, the best estimate po 
standard error of log k is not 0-505 obtained above, but 0:610, obtained by putting K = 0-2 
in the denominator of expression (24), 


It is of interest to note how the standard error of log k would have differed from 0-610 il the 


í 
meii tt 
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factor 1 + ¢ had been omitted and if the denominator of expression (24) had been simplified 
according to (iii). One finds: 


Omitting 1+¢ | Using 1+¢ 


0-27 | 0-61 
0-26 


Even in this rather extreme case, the simplified denominator gives much the same answer 
as the more accurate one; the factor 1+¢, however, is very important. 
Using the factor (7), the above standard errors would be increased by at most 0-01, 


The author is grateful to Dr F. Garwood for a number of suggestions. This paper was 
prepared at the Road Research Laboratory and is published by permission of the Director of 
Road Research. 
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APPENDIX 
Distributions of k and log k when the K; are equal 
We study here the moments of the distribution of log k, where k is the solution of equation (2), when all 


the x, are equal. Put y = logk—logx, ie. k= ke”. We shall obtain expressions for the first four 
moments of y, and then for its second, third and fourth cumulants, which are the same as those of log k. 


We have 


n 
0= Zb-XpLlXOS 
A Ole.) ee) ] 
25-20-1780 ena + 
Expanding in powers of y, 
yis a —KbC 5 nO +y? iE. | 
V 
1 nk nC? nk?O? ] 
See ee 
[sas c; (1+«0)* 


py "C T. nit (3 ee a | og’). 
+u atapo i rop 9 U Tech. ROP” p“ 
Thus e= ay (la-) +ga- B+ YW? Gao da + By — 9) yf O, 
where 
nkiC* 
a b nkC nec? = eco ,$-2X-I——e 
Ta, a= Baa Effe, !7 700€ 77 Os 
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of smallness as c. We may therefore write 


| 
We shall assume that errors in the estimation of x are small, so that y is small. y is then of the c i 


y = Ac + Be? + Ce? + Det + 0(e5), 

where, by straightforward algebra, 

A I/, 

B-[f-iee,  , 

C = [6/* — 3af + a* — g/ ga, 

D = [~] +a p - a*?y +023 ahy — $a P + 5/0 /a7. 
Further straightforward algebra using the formulae for the moments of the binomial distribution leads 
D Ele) o, H(e)=a, Ele) = 25, E(c)-a—0f- 6y+ 302. 

It may now be noted that the expansion of y in powers of € up to es is correct to order N-? where N is 

the sample size. (n,, and C; are all assumed to be 0(1) for this purpose.) This follows from the facts that 
A, B, C, ... are of order NI, N-?, Ns, .. and that the expectations of e?, e?, et, &^, ... are of order 


Ni, N', N?, N*, .... In what follows, all terms of higher order than N-? will be omitted. 
We now write down the first four moments of y. These are 
Fily) = AE(c) + BE(e*) + CE(e*) + DE(e*) 
2B-a 2B-a 3 
= P-a E-A (Gf? - 3a fe- Say) aC do otf + aty N Safy — Saf? + bff*), 


hy) = A?E(e?) 24B E(e*) + (B24 240) Hes) 


1 (2f-a)y? 3 
= 1 OP i p sap 2— avy), 
KU) = ABe) +342 BE (et) = EA, 


M) Are = >, 


Using the relationships between moments and cumulants, we find, still to order N-?, 


Ms 5(25 c) 
& = vu UP m Bap + Hat — gay) SOF a 
225 —a 
— &, 0. 


The variance x, of y, and therefore of log k, is thus 1/ to order N-!, in agreement with equation (5), 


1805 x4 a different method. For moderate values of N, however, the second term may be important. 
í= 4, then 


(1+«)? 3— 2K 3K 1 
R= l+- $ 
F r CREER, ( * 2k 85 


For values of x near to unity, the factor }(3 — 2x + 300 / is near 2 (it reaches 2-75 at & = 0-5 and x = 2:0). 
In most cases it is therefore sufficiently accurate to use 1+2/Sn, as a correcting factor. It should also 
be noted that the expression for the first moment of y shows that there is a bias of order 1/N in log k as an 
estimate of log x. This, however, is likely to be negligible in most circumstances. 

The coefficients of skewness and kurtosis are 


2(28—a) 
TE 


= k,/x,! = 
Yı aKa a 


HON), yy = K = (N). 


This means that as the sample size becomes large, these coefficients of the sampling distribution of log 
k approach those for a normal distribution, for which y, = y, = 0. 


The coefficients y, and y, for the distribution of k, rather than log k, may be obtained similarly, or = 
the moments of log k. We find 


4, 
n= I A OANA), y, = o. 
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Thus the coefficients of k also tend to those for a normal distribution, but not so rapidly, since 4f + a will 
usually be numerically greater than 2(2/ a). For instance, if each C, = C, then 
Tilog k) = 2x0 —2 
ylk) dci 

This lies between +1 and — 1 for all xC greater than } and lies between +0-4 and — 0-4 for all xO greater 
than 0-4. It vanishes for & = 1, which is in the middle of the likely range of «C. We have therefore 
demonstrated that, provided certain reasonable assumptions are valid, log k is more nearly normally 
distributed than k, and so is a more suitable test statistic. 
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(a) Exact distribution, 
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t var. log k= E 
(c) Lognormal distribution with 


e 
n 
ua 


var. log iere)» 


0-05 


25 50 100 


04 02 04 1-0 


0:99 


Example 2. C= I, x= 1, Tn. = 30 
(a) Exact distribution 


0:75 
(b) Lognormal distribution with 
^ 
— " 
30 =-—=0:1333 
rk var. logk En, 
o 
: (c) Lognormal distribution with 


log k= $ l+ 3 
var. log -z En, 
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Fig. 2. Distribution function of k, Example 2. 
£ Biom. 45 


22 


342 A problem in the combination of accident frequencies 


These results are illustrated by two numerical examples, shown in Figs. 1 and 2. They dest 
distribution of k when each C, is unity and & is also unity. In this case, 


where Eb; follows a binomial distribution 
471. 


Figs. 1 and 2 show the resulting distributions of k for En, = 10 and En, = 30, plotted on log - prob 
ability scales. It is clear that the distribution of log k is very nearly normal, and that the formula given 
in (c) for its variance, equivalent to expression (24), is a good approximation. Omitting the factor 
1+ 2/2n, from the formula for the variance gives a noticeably poorer fit in both diagrams. 
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THE MULTIPLE-RECAPTURE CENSUS 
I. ESTIMATION OF A CLOSED POPULATION 


By J. N. DARROCH 
Department of Mathematics, University of Cape Town 


1. INTRODUCTION 


1-1. A primary classification of the many problems which can now be included under the 
heading of capture-recapture analysis is the one separating the census which uses multiple 
recaptures from the census which does not. 

The best example of the latter type is the ‘fisheries census’, where the main catching is 
done commercially and the experimenter's job is to keep the population supplied with 
tagged individuals. In this case, recaptured fish are obviously not available to him for 
retagging and all he can hope for is that the captured tags are returned to him. Most of the 
paper by Chapman (1954) is devoted to this type of situation and contains some very simple 
estimates derived by the use of large-sample large-population Poisson approximations. 
Gulland (1955) shows how, if the catching is considered as a continuous-time process 
with constant effort, the natural and fishing mortality rates can be estimated from the 
behaviour of tagged fish alone, that is without any knowledge of total catches. It follows 
of course that with this knowledge estimates of population size are also available. 

In the multiple-recapture census it is usually the experimenter who does both tagging 
and sampling and it is assumed that he employs a method of capture which does not kill 
the animal or affect its future behaviour. The experiment then comprises a sequence of 
samples Si, ..., S, say, where the members of S;, ..., S,_, are all tagged before being returned 
to the population, while the members of So, , S, are classified according to when, if at all, 
they have been captured before. The majority of papers have discussed this census and 
among them may be mentioned Bailey (1951); Chapman (1951, 1952); Craig (1953); 
Goodman (1953); Hammersley (1953); Leslie & Chitty (1951); Leslie (1952) and Moran 
(1952). In all of these papers except Goodman’s s is a constant. Goodman sets up a model 
in which the number of samples sequentially depends on the total number of recaptured 
tags, which is stipulated beforehand. As far as the individual sample sizes are concerned, 
we notice that everywhere except in Hammersley’s paper, each sample S; is completed 
when one of its statistics attains a prescribed value. This statistic is usually simply the 
sample size, but in what has come to be known as the inverse sample census itis the number 
of tagged or the number of untagged individuals recovered. In this connexion see Bailey 


(1951) and Chapman (1952). It goes without saying that the theory of all these papers can 
ber of classes in a population if the classes are of 


be applied to the estimation of the num 
equal size and sampling is with replacement. The number of classes represented in a sample 
resented in a subsequent 


constitutes its size and a class is ‘recaptured’ when it is rep 
sample. : 

The latest extensions to the general problem have been made by Chapman who exploits 
the natural stratification of animal populations, with respect to type (sex, species) of 
individual (1955) and with respect to place (1956, with Junge). 


344 Multiple-recapture census. I 


1-2. Inthe present paper we treat the multiple-recapture census for which the number of 
samples sis fixed (except in $$ 4-4, 4-5 and 5-6). In $$3 and 4 the sample sizes are regarded as 
constants and in § 5 as binomial variables. 

Most of the above-mentioned work on the multiple-recapture census has been applied 
to closed populations in which there is neither departure due to death or emigration, nor 
augmentation due to birth or immigration. The restriction to a closed population also 
prevails here, but in a second paper we shall take account of both departure and augmen- 
tation. 


1:3. To extract all the information from a multiple-recapture experiment, tagging must 
be differentiated in order that the full ‘history’ of any individual can be inferred each time 
it is captured. This can be effected in two ways. Either an individual is given a numbered 
tag at its first appearance, or, each time it is captured it is given a new mark distinctive of 
that capture. For some purposes, however, similar tagging is sufficient, where all that is 
required is that each individual bears a mark after being captured. It need not be remarked 
when recaptured. 

2. THE ALTERNATIVE MODELS 


2-1. Let n be the total number of individuals in the population. 

Let s be the total, fixed, number of samples taken. 

Let ur be the number of individuals caught in the ith sample but not otherwise, w; the 
number caught in the ith and jth samples but not otherwise and similarly jy, ete. 

Denoting a subset of the integers 1, 2, ...,s by w, let 


r= Nu., = Nu. PUTES We 
w i i<j 


the total number of different individuals caught in the complete experiment. 

Let a, be the size of the ith sample. Then a; = Y u, where summation is over all subsets w 
which include the integer i. F 

We derive two probability distributions for {u,}. 

Let the probability that any individual is caught in the ith sample be p; = 1—q;. Thus we 
assume that all individuals are equally likely to be members of any given sample. Further, 
we shall assume that, for any individual, the events: caught in the ith sample, i = 1, 2, ..., 8; 
are independent. 

The probability of any individual escaping capture throughout the experiment is 


I qd; = Q, say. 
The probability of being caught in the i, ..., / samples and no others is 


74 . % Hl, say. 


di qi 
Clearly, the probability density of (% is multinomial, viz. 
Pug oes n! = n 1 (A) 
ltl = csi WPS 
where 0 < u, <n subject to 0<r = Xu, € n. We notice that 

P \ “o 2 2 at —a; 

orm Pa = n (G) -en(2)-5- aen (2!) “= nora 

v w mv i i Mi i 
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Therefore, (4) may also be written 


! 
pl lu. ] = Fl I p *. 40 


We now find p[(u,j | {a;}], the conditional density of {u,} given {a,}. It is obvious from first 
considerations, and is easily deduced from (A), that the a; are independent binomial vari- 
ables B[n, p;]. Therefore 


pl(aj] = I! (") pff 


! -1 
E" pits 7 gra (a) (B) 
where max (a r Na, (strictly, min (n, Lat.), 
i i 


O<u,<a; O<u,;<min(a,,a;), ete., 
with the linear constraints on the u, 


Du, = a (í212,..,6) 
wdi 


(B) is a generalized hypergeometric density. 

2-2, Which of models (A) and (B) is appropriate to any given experiment? 

In (A) the sample sizes a; are random variables while the p; are parameters. This model 
is therefore applicable when the effort put into the catching of every sample is fixed before 
the experiment begins since the p; are then fixed, though unknown. (B), on the other hand, 
involves the a, as parameters and should be used only when the experimenter is determined 
to catch no more and no less than a; individuals at the ith sample; and he will only be able 
to do this when animals are fairly easily caught. In fact, if we had to generalize, we could 
say that (B) is likely to be appropriate when the main limiting factor on sample size is the 
trouble involved in marking animals and (A) when it is the difficulty in catching them. 

Most previous work has been based on (B) and (A) is new. (Hammersley (1953) con- 
structed a model in which the a; are binomial variables but this model involves a flaw which 
invalidates the estimation based on it, as we shall show in paper II.) It has been customary 
to derive (B) as a chain of s — 1 hypergeometric probabilities P[S; SI, S2 +-+ Sab and this 
has led to its simplicity being obscured either by the notation employed, by considering 
only the terms involving n or by making sampling-with-replacement approximations. 

As well as being the exact probability description of the capture-recapture experiment 
when the a, are constants, (B) may also be regarded as a very useful device for eliminating 
^ when the a; are variables; it leaves only n to be estimated and 
namely r. One feels intuitively that to estimate n as if 
the a, are constants, when in fact they are not, is not a serious misrepresentation, and this 
feeling is strengthened by the discovery that the two models lead to the same estimate ñ 
of n, and to the same asymptotic estimate of var (ñ). Apart from demonstrating this, it 


may be wondered why there is any need to consider (A) at all. The main reason is that (A) 
is capable of generalizations which (B) is unable to accommodate and it is necessary to 
discuss (A) for the closed population before going on to these generalizations, some of which 


the nuisance parameters p; 
provides a sufficient statistic for n, 


346 Multiple-recapture census. I 


are the subject of paper II. Also, the ease with which a multinomial probability is mani- 
pulated gives it considerable advantage over a hypergeometric probability, even for a 
closed population. 


3. ESTIMATION USING MODEL D 


3-1. The fact that ris sufficient for n has the important implication that n can be estimated 
from similar tagging. 
Regarding (B) as the likelihood L(n) of n, and omitting constant terms, 


log Len) = X log (n- a;)! — (s I) log u- log (n — r)!. 
i 


An equation for the maximum likelihood estimate ñ of n can be found by equating 
Alog L(n) to zero. This involves an error of less than unity in the solution and is equivalent 
to the ratio method of maximizing L, which equates L(n) to L(n I). Since Alogn! = logn, 
ñ must be one of the roots of 

I (n d,.) = n*-1(n — r). (1) 


(1) has a single finite root greater than r which maximizes the likelihood, except when r 
takes one of its extreme values. (i) If r = J a;, no individual is observed more than once 
i 


and fi is infinite. (ii) If r = max (aj) = am say, no individual is observed which does not 
appear in the mth sample and % = r = di. It is of course in the nature of the capture- 
recapture experiment that (i) and (ii) are extremely unlikely to occur. 

(1) may also be obtained by equating r to its expected value p, say. For 
I (n—a;) (2) 
n—p = E[n—r] = PY 
This follows from the identity in and the a, 


n n! 
10% r, (uu) (n=)! II koe 
II (n—2;) 2 
i 7 n-—1- (n-1) 
since E[n—r] = "T 10 im ) ee 
IIn-a) 
TN E C - 


Similarl DES gh o a —1) (3) 
imuary, E((n—r)(n-r—1)] = "—mÜn-ly3 ^ 
with corresponding expressions for the higher factorial moments of n —r. 

3:2. To apply maximum likelihood large-sample theory in finding the variance of an 
estimate, it is necessary that the following three conditions are fulfilled. (a) The sample 
size must be a constant. (b) The likelihood must consist merely of the product of the 
individual likelihoods for the separate sample members. (c) The range of summation of pe 
random variables must be independent of the unknown parameters. Except for one mode 
discussed in § 5-7, which is artificially constructed for the purpose, no other model of = 
paper satisfies these three conditions. In the present context r is the sample size, as distinc 


J. N. DARROCH 347 


from the a, which are the separate catch sizes, and (B) obviously does not satisfy (a) or (5). 
(It does satisfy condition (c) provided n> Y;a,.) Model (A) breaks all three conditions. 
í 


We can, however, use the d- technique to find the asymptotic variance and bias of fl. 
(1) may be regarded as defining ñ as a function ñ(r) of r. By (2), i(p) =n and we may 
therefore expand % about n as a Taylor series in powers of r—p. If we consider # and r as 
continuous variables we can say that dii/dr is finite and differentiable in the range 


a &r& Ea —1. 
Confining attention to this range, that is ignoring the possibility of (i) occurring, we have 
dit : 42 
—n = (r—p)|— —p)*| => 4 
fi—n = (r „f «ie p) E E (4) 
where r' lies between r and p. Differentiating (1) 
— 1 jq 1 
H E |o Pa 8 00% om) = 0(1), 
p 


dr n= In- n Fun- a 
where f(n) = Olġ(n)) means | f(n) | < Kó(n) as n— and each a; oo in such a way that the 
a,/n, and hence also p/n, are constant. Also 


ea n PO (1 
33 - o(;) "EP 4 00%. 


dri 
Further, var (r) = var (n—r) = E((n—7)(n—r— 1)] - (n—p)—- (n— py. 
Therefore, using (3) ; ; ‘ 
vat) (np) | ^ Bag = 00 (5) 


where f(n)^ (n) denotes that f(n) = $(n) (1+ O(n)]. Making further use of factorial 
moments we find that 
E{(r—p)*] = Oln) and Efe -, O(n"). 

Squaring (4) and taking expected values MS 

Ej(ñ-n)?] var (r) fl, 4 

(The error in replacing E[(r—p)?] over the restricted range by var (r) is O(n?c"), O «c« 1, 
which is 0(n-1).) Thus, for the limit process stated 

12221 "Mg eos] n 

H=) A x] = O(n). (6) 

positive bias of fi. Then extending the Taylor series by one term 


Let = Hl n, the 


and taking expected values, we find that 2 
gl var (v [s], = O(1), 


(7) 


whence 
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Since # = O(1) and Z[(& —n)?] = O(n), E(( — n)*] ^ var (ñ). Thus, it makes no difference 
whether we speak of mean square error or of variance and (6) is equivalent to 


s—1 LI n 
n th Tn -a, 


(8) 


3-3. When s = 2, r = a, T- u and (B), written in standard hypergeometric form, is 
5 
2120 Na — 412 
b 
a, 


(J) is a linear equation with solution % = a, 45/45, the familiar Peterson estimate. 
Chapman (1951) showed that 
„ (4, +1) (a +1) 
n = — — —— 
5-1 

is preferable to ñ, since it is always finite and is almost unbiased. This could very nearly be 
inferred from (7). For that formula gives (n — a) (de) / (a d) as the approximate bias of 
ñ, which is estimated by (a; — 12) (a5 — w,5)/u35, and 


and (41 = 143) (a — 113) ; 
211201 +1) 

We notice that n’ is the solution of (n —a;) (nd) = (n+ 1) (n — r), but unfortunately it 
is not true that [T (n —a;) = (n+ 1)*3 (n — r) yields an almost unbiased estimate for general 
values of s. i 

Chapman (1952) showed how n’ can be made the basis of almost unbiased s-sample 
estimation. We shall wait until § 5-3 to comment on his recommendations, as they can be 
more easily discussed for model (A) than for (B). 


3-4. Turning attention now from point estimation to confidence interval estimation 
of n, we assume that r is approximately normally distributed about p. We have already 
noted that r has moments jj = O(n) and ji = O(n). Therefore 


1 


Also, one finds that ys a —3 = O(n-). 
2 
The expected value of r, regarded as a function of n, is 
II (n—a;) 
p(n) = n— EN b 


and in this notation, the equation for ñ is p(ñ) = ror i = p^ (r), say. Let a?(n) denote the 
variance of r, given by (5). Then 


Pir — on) € p(n) <r+ko(n)] = 1— e, 
where k = k(c) may be read from normal tables. The inequalities are approximately 
r — ko(Ài) € p(n) & r-- koli) 


= 71 <p(n) Sra say, 
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or, since p7!(r) can be shown to be a monotone increasing function of r, 
P(r) €n« p^t(r,). 
„ (r,) and p7!(rj) (which are precisely the same as fi(r,) and fi(r,)) may be regarded as first 
approximations to the solutions for n of r — ke(n) = p(n) and r+ko(n) = p(n), respectively. 
We may now, if we wish, proceed to better approximations nf and nf say, obtaining 
n nn 

as the 10001 -e) % confidence interval for n. 


4. ONE INDIVIDUAL PER CAPTURE 


4-1. When each sample is of size one, (B) is the obvious probability model to use, though 
(A) can be adapted for the purpose as we shall show in § 5-6. (B) is the basis of the present 
section and is equal to 

1 A 1 
në (n—r) u 
LJ 


(B,) 


where II ui = 1, since every u = 0 or 1 and therefore every u,! = 1. 
Summing (BI) over all values of {w,} such that X t = fp E" = fy, ..., we obtain 
1 n! s! 
n (n—r)! (10^ (205... f, fa! ... 


as the probability of not catching n—r individuals and of catching aie times where 
x= 1,2,...,sand Xf, = r, Daf, = s. The step from (B,) to (9) can be made by considering 
z à 


(9) 


the number of ware of distributing s balls in r cells in such a way that none is empty. The 
argument, which need not be included here, follows from putting u; = 1 if the ith ball is 
alone, u; = 1 and u, = w; = 0 if it is with the jth and no other, etc. 

Summing (9) over all values of {f,} such that Y, f, = r and Lethe = s, we obtain (Jordan, 
1947, p. 206) a s 

n? (n—r)! % uo 

as the probability of catching r individuals with s samples, where 7; = Ar(0*)/r!, a Stirling 
number of the second kind. (10) was found by Craig (1953) when considering the estimation 
of a population of butterflies. 

4-2. For the purpose of estimating n little alteration is required to the general results of 


89 3-2, 3-4. : 
ñ is the solution of (n— 1) = N r), (11) 
which may be approximately written 


eon = 1— r[n. 


The appropriate limit process is now n> 00, soo such that s/nis constant, and we find that 


var (ñ) ee 1 % = O(n), 


gele- Lan -2 = O(1). 
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Confidence interval estimation may be performed with 
pin) = n[L—e-*^], e*(n) = nengen — 1 — sfn]. 
4-3. Suppose now that instead of s being fixed and r variable, sampling is continued until 


a fixed number, r, of individuals have been caught. Then 
P{r—1 individuals in s—1 samples and a new one at the sth sample] 


" n! 2a; n-r4l 
n(n TI)! 


! 
ac (12) 
where 8 — r,r-- 1, .... 

This model will be referred to as inverse, since the term sequential has already been given 
by Goodman to his census which we discuss in § 4-4. We remark that the maximum likeli- 
hood estimate of n remains the same as for the direct model (10). 

Let ¢(t) be the probability generating function of s. Now 


oo tr 
Les = ya 
(Jordan, p. 175). It follows that 
_(n-1)! ir : 
aae (n—r)! (n — t) (n — 20) ...(n—(r—1)t)' 
= que EY 
Differentiating G 07), Els]! = ny5 1 ; (13) 
K-02 —k n 
r—1 k 
a — 14 
var (s) ni I; (14) 
The method of moments estimation equation, obtained by equating observed and 
expected s, is 1 8 (15) 


k-on—k n` 


As Craig pointed out, (15) is the exact maximum likelihood equation for the likelihood 
SEA and the solutions of (11) and (15) therefore differ by one at most. (15) has no 
solution ñ>r when s>r(1+}+...+1/r) = s(r) say. As far as the method of moments 
interpretation of (15) is concerned, this is explained by the fact that, since n >r, E[s] < Sol). 
That is, there is no expected value of s to which an observed value greater than So) corre- 
sponds, and it is therefore meaningless to equate them. It is not likely that s will ever be 
greater than sọ(r) in practice. s,(100), for instance, is 519. Before making 519 catches + 
obtain 100 individuals, the experimenter would be sure to doubt the randomness of his 
sampling or the correctness of his (necessary) information that n > 100. 1 

For the likelihood (12), s is sufficient for n and ignoring the possibility that s = 7 (which 
makes fi = oo), using (14) and the same technique as in $3:2, we find that 


—1 


var 00 » ac 


k=l 


r-1 k "—1 k —2 
brn Ej [= e» 00) 


where the limit process is now n —co, r — oo such that r/n is constant. 


1 
= O(n), 
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It is readily shown that s is normally distributed neglecting terms O(n-3). Therefore, 
confidence interval estimation of n based on (13) and (14) proceeds as in § 3-4 except that 
ñ = (s) is now a monotone-decreasing function of s. 

All of the formulae appearing in this subsection may of course be simplified by using 
integral approximations to the sums of reciprocal powers of n — k. 


4-4. The sequential census of Goodman (1953) can be described as follows. Before the 
experiment is begun, an infinite sequence {a;} of sample sizes is postulated together with I, 
the number of tagged individuals to be recovered. Sampling stops at the completion of the 
sth sample, s being defined by 

1 s 
a,-1,,<l, Ta -, l, 
i=1 i1 
where r, denotes the number of individuals observed in the first s samples. 

When all a; = 1, sampling stops as soon as s—r = l. We comment briefly on this par- 
ticular case using the approach of the present section. We require 

P{r individuals in r+1—1 samples and a previously caught individual at the (r+/)th] 


! 
ahd oa ae (16) 


Maximum likelihood estimation remains the same as before and r or s (= r + l) is sufficient 
for n. There is, however, a minimum-variance unbiased estimate. Since 


roon 
—— — ERI 
E n=)! Tra 


is an identity in n for any positive integer l, 


5r m 
Ant (n-r)! das beng 
Therefore o7,,,/07,,, is an unbiased estimate of n. Moreover, it is uniquely unbiased as is 
easily seen by induction on n, and because it is sufficient it has minimum variance (Rao, 
1952). 
Using more general methods, Goodman expressed the same estimate as K (r, /K C I- 1), 


where K (r, O) =r and K(r,l) = rXK(tl- 1). By observing that o7, = A^'[(r-- 1) erit] 
ea 


1 M 

(Jordan, p. 171), and defining & = X90 + constant (Jordan, p. 101), it follows that 
* * , — | 

ts for the equivalence of the two expressions for the unbiased 


K(r,l) = rer, which accoun of t c he unl 
estimate. dosiah made reference to tables facilitating the calculation of this estimate. 


He also provided another basis for estimation by showing that as n— oo, l remaining con- 
stant, the distribution of s?/n tends to that of x3. 


4-5. Does the ratio of two Stirling numbers afford an unbiased estimate of n for the direct 


or inverse models? The answer is no, except in one unimportant instance: for the direct 


census with s >n, g is unbiased and has minimum variance. 
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The application of the three models of this section to the estimation of the number of 
classes may be framed in the language of coupon collecting. (10), (12) and (16) correspond, 
respectively, to the collection ceasing when the number of coupons (s), the number of 
different kinds of coupon (r) and the number of ‘swop’ coupons (s—r) reach prescribed 
values. 


5. ESTIMATION USING MODEL A 


5-1. It will be seen from (A’) that r and {a,} are jointly sufficient for n and (p). Con- 
fidence interval estimation of n is therefore no longer a practical possibility. 

Differentiating (A’) with respect to p;, we find that the maximum.-likelihood estimate of 
p, is : 


Hence, & is again the solution of (1). 

The same estimation equations may also be deduced by the method of moments, for 
E[a;] = np; and E[n —r] = nQ. 

The derivation of formulae for the variance and bias of 7% is much the same as for model 
(B). Writing r= a, and Q—4g,,, = 1—5,,, for convenience, the solution of (1) is 
f; = ña], « = 1,2, ...,5-- 1. i [(np,]] = and 


3 On on On 
A X (a. — MPa) 9a, ar i X (a, TP gat Ee ND (ah — e da, ba, * any (17) 


where all derivatives are evaluated at {a,} = {np,}. It can be shown that any derivative of 
fi of order kis OH E for the limit process , {p,} constant. Also, that all multinomial 
moments of order 2/ are O(n?) and of order 21+ 1 are O(n’). These two facts, combined with 
several pages of tedious algebra, lead to 


1 1] 
var () n|— -5—1—Y— (18) 
i "lg i D 
EIL fo 
ect icc Pa eee 
and "3 i di | . ; di (19) 
=~ +s—l— H 
la Yn 
5-2. When s = 2, nS (tl) 1 ww "s 
112 ＋1 112 ＋ 1 


is again an almost unbiased estimate of ». For 


n n! 

v — — -r us puis = 
ws, us, us (NM — 7)! Uy! eg! Uy 9! r 
where Q = 4,42, P, = id: Ps = be, Pj = p ps, and it follows easily that 


E[n'] = u- — yp)". (20) 
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Seeing that (1 — pi is approximated well by e- As, 
E[n'] = n- E[n - rje E" (21) 
to a good approximation. The negative bias of n' will in general be small. 
Consider now the conditional expectation of n' given a,. This leads to a slightly different 
statement of the last result which we shall require in $5:3. For this purpose write 
us 
ual 
Given di, Ug and ui are independent binomial variables B[n — a,, p,] and Bla, pal, respec- 
tively, and 


n' = a, (a, 1) 


lu. lad = M- £L a] = rs tatem 


ttl 41 1)py 
Therefore E[n' |a] = a, + (n —a,) [1 — q$:**] 
=n- (n-a) qo(1 — py)” (22) 
= n- E[n —r | a,] e- H (23) 


to the same approximation as before. The expectation of (22) over a, plainly gives (20) 
again, but the inference we wish to make is that the bias may be neglected after taking only 
the conditional expectation and, what is more, that the difference between (22) and (20) 
is negligible. 

125 


From (18), var(n')~n| 13 (24) 
192 91 4 


= 298, 
PiPo 

E»-u (25) 
E[u] 

5-3. In order to consider Chapman’s recommendation (1952) for an s-sample unbiased 
estimate of n, let aa denote the number of different individuals captured before the kth 
sample, k = 2,3,...,8. (Thus a< = di.) Further, let ax denote the number of different 
individuals captured before and at the kth sample. Thus, for instance, ifs = 4, 

d. 3.3 = Urg t 23 + 93 + Urga + Ugga + osa 
Clearly, Denys = leckt Og == AE. x 
and N—Aepyy, d- deR. E dE dE. E dx. x 
are distributed multinomially with parameters n and 
dde (I-21 -- Gea) ie (he di) Per (1-41 de Pe 
Therefore, in the same way as for s = 2 


1 (- LD] (k — 2,3, ...,8), 

cya 1 
is an almost unbiased estimate of n if, as we shall assume throughout this subsection, 
We notice that the covariance of any two of these estimates is 


=n 


sampling is large enough. 
negligible compared with their variances. For, ifl<k 

E[ni(nj EIn | det, d, az] = n (Z| [nj | a4] — E l), 
which is negligible compared with var (n4). (Compare n times the difference between (21) 
and (23) with (25).) 
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s 
So far, we can say that T Aynz, where LA = l, is an almost unbiased estimate of n and 
k-2 k 
var (X A,n;) Az var (u). Now, from (24) 
k 7 
var (ny) 2 +1-— RESU say 
, 41 dye 41 4-1 L p i 


I may be described as the information on n contained in n}. Similarly, let J be the informa- 
tion contained in %, the original s-sample biased estimate of n. Then, from (18) 


1 1 9 1 
Tx ＋8—1 j: 26 
ee à (26) 


8 
We notice that L J; = I, which means that the n; contain between them as much informa- 
k=2 
tion on nas ñ. However, the only linear function Au which uses all of the information 
k 


is the one for which A = 7,/I. The Z, are unknown and to substitute their estimates would 
introduce bias which the use of XA,»;, aims to avoid. Chapman proposed the arithmetic 
k 


mean Ynj/(s— 1) = n* say. n*, however, involves a loss of information of amount 
k 
een, 
k 


which will not in general be worth forfeiting as the 7, will most likely be fairly disparate. 
Accordingly, as a general rule, we may say that 7i b, where b is the estimated value of f, 
is preferable to n*. 

The »; may be called backward estimates of n in contrast to the forward. estimates ny 


defin 
ERN „ ati, 


n, = — 
ar. * tl 


where a~, denotes the number of different individuals captured after the kth sample and 


1 (&=1,2,...,e—1), 


—1 E 
a, >, the number which are also caught at the kth sample. b A can be discussed in very 
" k=l 
much the same way as Y; An; and, theoretically speaking, there is nothing to choose between 
k=2 
them. There is an important practical difference, however: while the nj, can be evaluated 
from similar tagging, the nj require differentiated tagging. 


5-4. If ‘catchability’ is constant throughout the experiment, p; will differ from p; only 
if the amount of effort expended on the ith sample differs from that expended on the jth. 
More precisely, if we know that e units of effort are expended on the ith sample, we may write 


gym emat (4 = 1,9,...,5). 


This follows from writing the probability of any individual escaping capture when subjected 


to de units of catching effort as 1 —ade+o(de). Previous authors have taken p; = «e; which 
is an approximation, albeit a perfectly valid one, to the above relation. 

Incorporating q; = e in (A’), it is found that the maximum likelihood estimates of 
a and n are obtained by solving 

Qili 
Iersel 


n—-r=ne"7 and m Le. = 
7 7 
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The method of moments versions of these equations are 


r= E[r] and gj Ein = o. 


Using the approximation p, = ae;, the latter equation becomes X a, = E a;). 
7 7 


Is it worth including knowledge of the effort in the probability model? To answer this 
question as speedily as possible, consider the case when it is known that all e; are equal and 
therefore all p; = p, say. The equation for n is 


(u -a) = n*7*(n —r), 
with solution fi, say, where à = ‘Sa. We find that 
7 
1 sm 
var (i,)~n|—+8—1 —4 ‘ 
G0 uf : 


which is the same as (18) when q; = q. Thus, in the limit, no information is gained by using 
the knowledge that an equal amount of effort has been given to each sample. This may be 
attributed to the fact that the knowledge would in any case have emerged from the sampling, 
since as n>, plim[a,/n—a,/n] = 0 and therefore also plim [a;/ñ—a;/ñ] = 0. Clearly, 
the conclusion of no gain in information as n> co may be extended to the case when the e; 
are known and unequal. 

For finite n, knowledge of the effort will produce only a second-order increase in in- 
formation. This would not be negligible for small populations, but it is when estimating 
small populations that the experimenter would be most wary of assuming that catchability 
is constant. For this reason, it will be better in all cases to make the p; free parameters to 
be estimated solely from the sampling. On the other hand, this inference is valid only when 
the population is closed, that is when there is no natural or artificial alteration of the 
population size. Otherwise, knowledge of the effort is very useful. For its use in the fisheries 
type of census see Chapman (1954) for instance. 


5:5. From (26), the information as a function of effort is 
P= Ile 1— Ser]. 
n i 


nstant and that there is a fixed total amount 


Suppose that catchability, represented by , is co 
off d e~ is fixed and con- 


of effort for the whole experiment. Writing a Le, = c we see that Q- 
sequently also Hr. Also, we have that 
1 a? a? 
= lie- gaa- 31 8 ah 


inimum and therefore 7 is a maximum when all the e; are equal. 


lfe Tie 2L | 
al- -A- 
i ion i es as s increases. Both of these conclusions are probably 
and thus the information increas sin pp $ J 


little more than reflexions of the fact that equal effort per samp 
of samples give rise to a larger El a, for given E[r]. 


8 
For given s, Xe? is a m 
i=1 


When they are equal 
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More important, from the practical point of view, than knowing how to get maximum 
information from fixed effort is to know how the information increases with increased effort. 
Supposing that all e; = @ say, 


a? = : 
Tig 25 1) e [1+ 3a(s - 1) 2 4- ...]. 


Thus, if the effort per sample is enlarged to kē, the information is enlarged to more than . 
While if e is held constant and the number of samples increased from s to s + 1, the informa- 
tion is multiplied by more than (s 4- 1)/(s — 1); in other words, the information is roughly 
proportional to the number of different pairs of samples, as might be expected. 


5:6. If s is large and each p; is small, the probability that any sample size is one will be 
a small quantity of first order, while the probability that it is greater than one will be of 
second order. Therefore, in the limit as soo and each 5, O, sampling becomes ‘con- 
tinuous’ and each ‘sample’ is of size zero or one. In this way, we obtain a valid description 
of the experiment for which only one individual is captured at a time. 


To formulate this idea precisely, let O denote the coefficient of 2” in Il (q: T rs); that is, 
i=1 
let Qo = Q, Ci = E P, Q, = XB; ... Then, summing (A) over values of (u] such that 
i i<j 


DL - f. Ui; = fo, ..., the density of {f,}, where f, = n —r, is 
<j 


fl. (21) 
pom 
Thy! 


Consider the limit process: soo, max (p;]— such that II (I-. =Q remains fixed 
i 
and equal to e~ say; that is to say, such that o Y; e; remains fixed and equal to A. For this 
: 7 


22 TI G. - p;2) ^ e^- and therefore O — e^ A2/x!. Thus, the limit of (27) is 
i i 


z—n(ez. (28) 
not" 
z-0 


Let us now redefine s as the random variable Y af,, the total number of catches made. 
g=1 


(28) can then be written 


n! EAN 

I = (=D f fT..." (4) 
Craig (1953) postulated this model as an alternative to (10) and discussed the estimation of 
A and n. Although the sampling is now a continuous process, it is by no means necessary 
that it extends over only one interval of time. In practice, the experimenter will expend 
effort until he catches an individual and then will pause while marking and recording it 
and letting it return to the population. There is no conflict between this practice and (A. 

The joint probability generating function of r and s for (A) is 


Elit) = RA = ML +h (eh 1)]". 
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We notice that the marginal distribution of s is Poisson with parameter nÀ and therefore 
that the conditional probability of {f,} given s is 
1 n! s! 
PU 1017 vis ny GT A. Ay Aa 
which is (9) again. Thus, there are two routes from (A) to (9): either via (B) and (B,)or via(A,). 
The marginal distribution of r is binomial Hu, 1 — e^]. The conditional probability o 
{f,} given r is therefore , 


r!A* 
Plt} |= yam an RU it 


This density was discussed by Craig from the point of view of the truncated Poisson dis- 
tribution defined by the probability of a caught individual being caught z times being 
ex AM 
lea! 
but plainly cannot serve as a probability description of any possible way of conducting a 
capture-recapture experiment; for it demands that both the total effort and the total number 
of different individuals be fixed in advance, which is impossible. 
5-7. We notice that (29) satisfies conditions (a), (b) and (c) given in $3-2 for the applic- 
ability of large-sample maximum likelihood theory. This is also true of the counterpart of 
(29) for the general case. From (A) we have 


! R 
Ibn - galha) (30) 


(2—1,2) 


* 


(30), like (29), does not mention the uncaught individuals and does not truly c describe any 
experiment. However, densities of this type are important as theoretical devices, as we can 
demonstrate by applying standard maximum likelihood theory to (30). f 
Maximizing L = log p[(u,) |r] with respect to {p,}, we obtain the equations ; 
F 6 
b. 1-Q 
Let 0 = (1— Q)-1. Then 0 = (1— Q)-! is the maximum likelihood estimate of 0. (31) implies 
that Ó satisfies the equation II (r0 at) = (0% (r0 — r), that is rÜ satisfies (1). Therefore 
rÜ = ñ. t i 
We can obtain the asymptotic mean square error of H for (30) as r—oo by first finding 
a i . We find that 
the information matrix, V-! say, whose (i,j) element is -E w: . We 


DWD, 


y= "ugs 
where D is the diagonal matrix whose (i, i) element is qr and W is the matrix whose (i, i) 
element is w,—1, where w; = q;(1 — Q)/(p;Q), and whose every other element is — 1. 

Now, as roo, E[(0—0) |r] dd, 
where d is the vector whose ith element is the derivative Hop: evaluated at {P} = (nj 
On differentiating ĝ = [1— II (1 — 2], it is found that 
Q 
BE USQUE 
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where 1 is the vector all of whose elements are 1. Hence 


Q r — Q r 
M e D bi -1 
E[(8—6y |r] h W-D-!D1 MG i 
"2 
To find 1'W-!1,1let WII = x. Then Wx = 1 and the ith element of x is seen to be — 4 
1-Xw 
"Therefore i 
Z wr: 
Q , Q i : 
—08|rele——— = ——À T—— 
I- d~ ga K = Of 
i 
Substituting for w; 
Q pil[t 8 
, say 
zU-orId-x exe |latt-i-3u] „ 
By means of the d- technique we can further say that 
E(0—0y|r] = 2e | 
(32) 
Wi, 1 
and a-oa - 5 e(1).] 


Using (32) we can re-derive formula (18) for the asymptotie value of H-. For, 
since 0 = (1— Q)-! and % = rd, 
B((à—ny |r] = r*E(U—6) | r] + 6%(r—n(1 —Q))? + 2r6( —n(1 —9)) E(0-0)|r)- (33) 
In evaluating the expectation of the right-hand side of (33) over r, we consider the range 
kn <r <n, where 0 « k « 1 — Q and kn is integral. This approximation permits the use of the 
asymptotie formulae (32) and produces errors which asymptotically are negligible. Two 
observations are sufficient to establish the latter claim. First, if P, denotes the probability 
of 
‘ P p.p, Get DU 
r0" I (n*1)0-9)-En 
(Feller, 1957, p. 140), and the last quantity is O(n-3c"), where 


d 1-Q k Q 1-k 

m as ct 

Secondly, 0 « d(s) r provided always that we ignore the possibility that Lat — ? which 
makes @ infinite. Substituting from (32) we find that i 


E((Ài —n)*] = E[rO] 4- 0? var (r) - O(1) 


= n| +8s-—1 -xi| "+0(1), 
which is (18). 3 
The rederivation of (18) is not important in itself. However, when combined with the 
fact that 0 is asymptotically efficient for 0, the above argument has an important conse- 
quence. Namely, that for the class of estimates n* = r* which satisfy the very reasonable 
conditions 145 1 
E[(0* —0)* |r] = — 0 ; 


r r? 


(34) 
and vl: 0) |r) = "482 +o( 5), 
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ñ has asymptotically minimum mean square error since O is the minimum attainable value 
of An. (Conditions (34) are sufficient for this conclusion. They are quite possibly not 
necessary.) (34) implies that 


El(n* — n)'] = ayn +2, +0(1) 


and E{n*—n] = be. (35) 


which are even more reasonable. Unfortunately, (35) does not imply (34), though it is 
difficult to imagine estimates which satisfy (35) but not (34). 

In conclusion: among a wide class of estimates of n, those derived from (A) or (A,) by 
the method of maximum likelihood are asymptotically best in that they have minimum 
mean square error. 


I wish to acknowledge my considerable indebtedness to the referee for his invaluable 
comments on two previous drafts of this paper. 
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CONFIDENCE INTERVALS FOR DISTANCE 
IN THE ANALYSIS OF VARIANCE 


By M. G. BULMER 
Department of Social and Preventive Medicine, University of Manchester 


I. INTRODUCTION 


This paper is concerned with the so-called finite model (or Model I) of the analysis of variance. 
It is assumed that we are given n observations, Y1, ..., Yn, Which are normally and indepen- 
dently distributed with the same variance, , about mean values which can be expressed in 
the form 


k 
E(y;) = X Vul; (1) 
j=l 


where the z;;'s are known, constant, coefficients and the f;'s are fixed but unknown para- 
meters. This can be written in matrix notation as 


Ey) = XR, (2) 


where y is the n x 1 column vector of observations, X is the n x k matrix of the s and Bis 
the k x column vector of parameters. It is assumed that the model is stated in such a way 
that X is of rank k (this can always be done by eliminating any redundant parameters). The 
statistical problem is to test the null hypothesis, which specifies the values of the first 
r parameters, f}, . Hl, ( S. Any hypothesis which specifies the values of linear, inde- 
pendent, functions of the parameters can be put into this form by a suitable re-statement of 
the model. 

It is convenient to partition X into X, and X,, where X, consists of the first r columns of 
X and X, of the remaining k-r columns. B likewise can be partitioned into y and ö, where y 
contains the first r parameters and F the remaining k—r parameters. Thus y is the vector 
containing the true values of the parameters in which we are interested, while & is to be 
treated as a vector of nuisance parameters. It will also be convenient to write Yo for the set 
of parameter values specified by the null hypothesis, Ẹ for the least squares estimate of 
y, and y; and y; for arbitrary points in the hypothesis space, I’, containing all possible sets 
of values of the first r parameters. 

If S, is the sum of squares testing the null hypothesis, Yo, then 


8, = -YoY Bé - vo) a 
where B = Cu — Cina C Ca (4) 


(where Ci = X; X, for i, j = 1, 2); (this follows fairly easily from the results given in Kemp- 
thorne (1952, pp. 59-61)). When the null hypothesis is true, S,/o* is distributed as & chi- 
square variate with r degrees of freedom and S;/o?, where S, is the error sum of squares, 18 
independently distributed as a chi-square variate with n—k degrees of freedom. When the 
null hypothesis is false, & /r is distributed as a non-central chi-square variate with r degrees 
of freedom and non-centrality parameter A, where 


Ao? d = (y— Yo)’ B(Y- Yo). 9 
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This expression has been obtained by substituting y for $ in the right-hand side of equa- 
tion (3). The distribution of S, is unchanged. 

The purpose of this paper is to construct two types of confidence interval for à» Ac*, It 
will be shown in $2 that (2/n)! is a useful measure of the ‘distance’ of the true hypothesis, 
y, from the null hypothesis, vo; it is hoped that a confidence interval on this measure will be 
useful when one wants to know how good an approximation to the truth the null hypothesis 
is. In $3 a ‘simultaneous’ confidence interval for 4 is developed which can be used at the 
same time as Scheffé's (1953) simultaneous intervals on the individual parameters or linear 
functions of them. In $4, an approximate confidence interval, in the ordinary sense of the 
word, is constructed for à. 


2. THE MEASURE OF DISTANCE 


The power of the analysis of variance depends on the ratio A = 4/c*, and previous work has 
been confined to this ratio, Confidence limits can be placed on A, in principle exactly by 
using the exact non-central F. distribution (see equation (33)), in practice approximately by 
using Patnaik’s approximation to this distribution (see equations (35) and (36)). Usually, 
however, the error variance is of no theoretical interest, and we should prefer to have infor- 
mation about ô rather than about /. 

The interpretation of (/n) as a natural measure of the distance of the true from the null 
hypothesis rests on the following two properties. First, let us for arbitrary y, and y, define 


d(v1 — 2) = (s — Y Bis — Y2) (6) 
so that d — yo) = 6. Then d (y, — y,)/nl is a true metric function on the T- space in fact itis 
the ordinary Euclidean distance on a linear transformation of the l'-space since B is positive 
definite. Secondly, for an arbitrary vector of nuisance parameters 5, ĝis the minimum value 
of the sum of squares 

(X, Y -XS6- X Vo- X. bi) (My X- XI Vo- Xabi) (7) 
over all ,. This can easily be shown by differentiating (7) with respect to t, and equating to 
zero. Thus à is the minimum sum of the squared deviations of the true model, Xp, from 
the model, X,¥)+X2,, partially specified by the null hypothesis. There are n of these 
deviations and so (6/n)? is a natural measure of the distance of the true hypothesis from the 
null hypothesis, In the case of an orthogonal design 


ô = (y-Y)' Guly—Yo)- 09 
In practice, ô is often most easily calculated from the formula 
à = (Si) ros. (9) 


For example, in a randomized block design, in which the ith treatment has been replicated 
n; times and has an average effect 7; (subject to the restriction Er; = 0), it is easily seen that 
ô = Xr?, when the null hypothesis is that all the treatment effects are zero. 

The techniques developed in this paper can be applied to the chi-square test of goodness 
of fit as well as to the analysis of variance. For, as Patnaik (1949) has shown, the chi-square 
criterion is, approximately, distributed as a non-central chi-square variate with non- 


centrality parameter —7;\2 
* À = nin; (^ x , (10) 
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where 7, is the theoretical and p, the true probability of an observation falling in thei 
the validity of the approximate expression for A depends on (7; — p;) being of ord 
Thus A/n can be regarded as a weighted sum of squared proportional deviations of p 
m; where the weights add up to 1 and so (Ajn)? is a natural measure of the distance! 
the true and the null hypotheses. The problem of placing a confidence interval oi 
quantity is formally the same as that of placing a confidence interval on (3/n)tin thea 
of variance when it is known that o = 1, that is when the error mean square is equal to 
has infinite degrees of freedom. It is conjectured that a similar procedure is valid ir 
applications of the chi-square technique (for example, when some of the parameters; 
estimated from the sample). 


3. A SIMULTANEOUS CONFIDENCE INTERVAL 
Let us write S,(y,) for the sum of squares testing a particular hypothesis, vi- If we as 
that the true value, y, satisfies the inequality 
Si) <rM, Ta, 
where M, = S,/(n—k) and F, is the upper 100 9% point of the F-distribution 
n—k degrees of freedom, then the probability of being correct will be 1 — a. 1f, theref 
find the minimum and maximum values of di- Yo) defined in equation (6), over alll ¥ 
of y, satisfying the inequality Sy) S E., 
and assert that ô lies between these extremes, we shall be correct with a probabilit 
least 1—a. 7 
We first observe that the extreme values of d(y; — Yo) over all values of y, wit un 
region defined by equation (12) are the same as the extreme values of d(y,— Yo) OV 
values of y, on the surface of the region, except that the minimum value when y, lie 
the region should obviously be zero. This follows from the fact that d (y, — Yo) can bere 
as ordinary Euclidean distances on a linear transformation of the I'-space. These ext 
values are given by the following theorem. 1 
THEOREM. The extreme values of d(y, Yo) over all values of y, on the surface Sí): 
are [S}(Yo) + cH. 


substituting the fixed value y, for the variable y, in (6). Differentiating 
d(Yı — Yo) SI 

with respect to y, (where m is an arbitrary Lagrangian multiplier) and equating to zero, 

1 (5 7v)! B= B = 0. 

Post-multiplying by B and transposing, 


x : N (us m —Y). 
which on re-arrangement gives 


(Lm) H= Yo) = MẸ- Y1). 
Hence, from (15) (¥1—Yo)’ B(i - Yo) = m* - vi) B(Y—y;) = neo, 
and from (16) 
(1+)? (y, — Yo)’ B(Y1— Yo) = m* — Yo)’ BOY — Yo) = ™*Si(Yo)- 
Eliminate m from (17) and (18) and the theorem is proved. 
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Thus if we write rM,¥, = cand assert. 

(8| - hy «à < A. (19) 
where S, = S(y,), except that the lower limit is to be taken as zero when 5, <e, the prob- 
ability of being correct is at least 1 — 2. This is an extension of Scheffé's (1953) method for 
placing simultaneous confidence Intervals on any number of linear contrasta; for any one of 
Scheffé's intervals is the shortest interval implied by the inequality (11). Thus we can place 
confidence intervals on any number of linear contrasta by Scheffé's method and at the same 
time on the ‘distances’ of the true hypothesis from any number of null hypotheses by the 
method of this section, and still be certain that the probability that all these intervals are 
correct is at least 1 a, since they are all implied by (11). If, however, we only want a con- 
fidence interval on the distance of the true hypothesis from one null hypothesis, then we 
should be able to find a considerably shorter interval. This problem is considered in the next 
section. 

Example 1. The data in Table 1 relate to the head breadths of 142 skulls belonging to 
three series and are taken from Rao (1952). An analysis of variance on these data is given in 
Table 2. To calculate the 95 % simultaneous confidence interval on the distance of the true 
hypothesis from the null hypothesis that the head breadths of the three series are the same, 
we find c = rM,F, = 2x 31-60 x 3-06 = 192-8, ct = 13-89. Hence the limits for q are S} + ct 
or 1-56 and 29-34 and the required confidence interval for (4/n)! is 0-131 to 2-462 mm. 


So far we haveonly considered the case of testing a single hypothesis, Y = Yo. If, however, 
there are several factors in the experiment, we may want to consider the effects of these 
factors (and of their interactions) separately. That is to say, we want to sit Y un 
exclusive components, a, ..., &m» Where a; contains r; of the parameters in y Tr. =r. We 
then want to set limits on the distance measure 6; of a; from a, separately wah 1, 7 ls 
Now, if we write Silat) for the sum of squares testing the sub-hypo Ai 
S,(e4) < SY) and so equation (11) implies 


Sila) «rM, Fy- (200 
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We can thus assert simultaneously for all i 
[Sk(aja) -F < 8; ISH a) +P, (21) 


except that, as before, the lower limit is to be taken as zero when SiC) < c. 

Example 2. An example from genetics will illustrate this procedure. Mather (1951) quotes 
the following data of Philp (1934) on the joint segregation of the two factors p and t in the 
poppy. In a backcross progeny Philp observed the frequencies given in Table 3. The three 
components of a chi-square analysis are given in Table 4, together with the simultaneous 
confidence intervals on the distances from the null hypotheses that : (1) the P, p segregation 
is 1:1; (2) the T',t segregation is 1:1; and (3) the two segregations are independent. The 
confidence intervals have been obtained by taking c = 7:815, the upper 5% point of the 
chi-square distribution with 3 degrees of freedom. We can conclude that the average 
(i.e. root mean square) proportional discrepancies of the P, p and the T, t segregations from 
their theoretical values are at most 15-3 and 15-7 % respectively, whereas the proportional 
discrepancy from the hypothesis of independence is at least 55.8 9%. 


Table 3 
PpTt Pptt ppTt pptt Total 
Observed 191 37 36 203 467 
Expected (with 116-75 116-75 116-75 116-75 467 
no linkage) 
Table 4 
Confidence interval 
2 
Item X D.F. on (3,/n)i 
Segregation for P, p 0-259 1 0—0-153 
Segregation for T', t 0-362 1 0-0-157 
Joint segregation 220-645 1 0-558-0-817 


4. AN APPROXIMATE CONFIDENCE INTERVAL 


The simultaneous confidence interval just considered is of course only appropriate when we 
want to place limits on the distances of the true hypothesis from several null hypotheses, oF 
when we may want simultaneous limits on the individual parameters as well as on the 
distance. If we only want limits on the distance of the true hypothesis from one null hypo- 
thesis, then a much narrower interval can be found. The purpose of this section is to con- 
struct such an interval approximately. The criterion used in the construction is that the 
likelihood ratios of the two extremes of the interval should be the same; these extremes do 
not cut off equal tail areas and are thus not, separately, confidence limits. We must therefore 
first determine what the likelihood ratio is. 


— 
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It is shown by Kendall (1946, p. 300) that the logarithm of the likelihood ratio for a par- 
ticular hypothesis specifying the value of y, is 


E 

-] 14-7]. 23 
i log ( S (22) 
If we substitute in this expression the value of y, which minimizes S,(y,) subject to the 
condition 11 -v By, — Yo) wd, (23) 


we shall have the logarithm of the likelihood ratio for à. It can be shown by a method similar 
to the proof of the theorem of $3 that the minimum value of Si vi subject to this condition is 


-a, (24) 
where, as before, S, is written for S,(y,). The likelihood ratio for à is thus a decreasing 
function of 

(St- at (25) 


We shall therefore try to find a confidence interval for ô by asserting that 
E ow «g(F), (26) 


where F = rS,/M, and g is a function chosen to make the probability of the truth of the 
statement as nearly as possible 1 — a. This is equivalent to asserting that à! lies between the 


me S} + Mg (P. (27) 


It should be noted that the distribution of the expression (25) depends on the non-centrality 
parameter, A, but on nothing else; this is why g must be a function of F, which provides 

information about A. we IN 
Before constructing a function, g, we shall try to find an approximation to the distribution 
of (25). Consider (stab? 
eae c oe 


When A = 0, y is distributed as a chi-square variate with r degrees of freedom. When 
À —> co, y tends to a chi-square variate with 1 degree of freedom, since si tends to a normal 
variate with mean db and variance o? (Patnaik, 1949). For intermediate values of A, it 
seems reasonable to approximate ay by a chi-square variate with f degrees of freedom, where 
a and f are chosen to make the mean and variance of ay and of a chi-square variate with 
degrees of freedom the same. If this approximation is satisfactory, then the expression (25) 
multiplied by a/f can be approximated by an F. distribution with f and (n — degrees of 
freedom. 

The mean and variance of y can 
mean and variance of S$. Writing A j 


(28) 


easily be found from Patnaik’s (1949) formulae for the 


(29) 


1-t ; 1 
we find H(y) = (0 - wate tor ) } 
Vy) = [a e- e +2)7]+0(r). 


(30) 
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ay can thus be approximated by a chi-square variate with f degrees of freedom where 
2E(y) f- 2E*(y) 


p ˖ CM e 
If we knew A, we could use this approximation to determine g as 
g(F) = Ely) x U(f,n — k), (32) 


where U(f,n — k) is the upper 1002 % point of the F. distribution with f and n — degrees of 
freedom (for known A, g(F) wouldof course be independent of F). However, we do not know 
A and so it must be estimated in some way from the observed value of F. It is proposed to 
calculate Az, the lower 100(1— g) % limit of A given F, and to determine g(F) from equa- 
tion (32) as if this were the known value of A. This procedure ensures that the probability of 
the confidence interval covering the true value of ô is exactly 1—a when ô = 0; the prob- 
ability also tends to 1 — when A > oo. 

In principle, an exact lower confidence limit for A can be found from the observed value of 
F by solving the equation 7 
5 3 (33) 


for A, where p(x, A) is the density function of the non-central V. distribution with non- 
centrality parameter A. In practice, however, p is too complicated to evaluate, and so 
Patnaik’s (1949) approximation will be used, that t?F can be regarded as a central F-variate 
with v and 1 — degrees of freedom where 

Zur pA)? 


= 34 
3 7 ＋ 2A GH 


and tis defined in equation (29). Thus in order to find A, the lower limit for A, we must solve 
the equation 


te = U(r,n—k) (35) 
in terms of A. This is equivalent to 
F 
F 36) 
À [ocn i xr. ( 


This is an implicit equation since r is a function of A, but an iterative procedure converges 
fairly quickly. If F < U(r,n — k), À; is to be taken as zero. 

The calculation of the approximate confidence interval for ô is not so complicated as the 
above development might suggest and can be performed in four steps: 

(1) Find A; by solving (36) for A. 

(2) Find Z(y), V(y) and f from (30) and (31), using A; instead of A to calculate t. 

(3) Find g(F) from (32). 

(4) Assert that à! lies between the limits given in (27). 

Example 3. Let us find a confidence interval for the data of Example 1 by this method. 

(1) We must first solve (36) by iteration. Starting from (V= I) = 5:58, which is 
an unbiased estimate of A, we find » = 7-58%/13+16 = 4-37; U(4-37, 130) = 2:38 and 
A = (3°79/2-38— 1) x 2 = 1-18. Substituting this value, we find v = 2-32, U(2-32, 139) = 2:93 
and A = 0-587. Reiterating, v = 2-11, U = 3-02 and A = 0-538. Reiterating, v = 2:094, 
U = 3-02 and A = 0-538. This is the solution of (36). 
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(2) Substituting this figure in (29), @ = 0-212 and ( 0-460, Hence Ely) = 1-018, 
V(y) = 2-269 and f = 0-913. 

(3) Evaluating U(0-913, 139) by the method of the next section we find that it is 4-06 and 
so g(F) = 4:05 x 1-018 = 4-12, 

(4) M,g(F) = 4-12 x 31-5 = 129-3. Thus the limits for 44 are 15-45 + 11-39 and the limita 
for (/n) are 0-34-2-25 mm. 


5. PERCENTAGE POINTS OF F WITH FRACTIONAL DEGREES OF FREEDOM 


In the method developed in the previous section, it is necessary to know U(f,, fa), the upper 
percentage point of the F distribution with f, and f, degrees of freedom, where f, is usually 
fractional. When f, is less than 2, interpolation in the ordinary tables is no longer adequate. 
. I have therefore tabulated U(f,,oo) for f, = 0-2 (0-2) 2-2, at the 1 and 5% points. Linear 
interpolation in this table is quite adequate and, provided that f, is an integer greater than 6, 
U(f,,f;) can be found from the interpolation formula: 

UU f) = Uf) re Uf -v (37) 
The accuracy of this formula has been checked by obtaining U(2,f,) from the values of. 
U(1, fa), U(3, fa), U(1,o0), U, 00) and U(2, oo) substituted in the corresponding interpola- 
tion formula. Even for f, as low as 6 these values were within 0-1 % of the correct value. 


Table 5. Upper 1% and 5% points of F for f, = 0-2 (0-2) 2-2 and f, = oo. 


5:8044 
0:4 11-012 5:1526 
0-6 8-7980 4-5745 
0-8 7-5002 4:1549 
1-0 6:6349 3:8415 
r2 6-0110 3-5983 


6. THE ACCURACY OF THE APPROXIMATION 
The probability of the interval developed in $4 covering the true value of à is exactly 1 —a 
when A = 0 or when A oo. Some values of this probability, P, for intermediate values of A 
have been worked out by direct numerical integration and are given in Table 6. 
The problem is to evaluate 


P Pr E an) La vf SES cun]. (38) 


where h(F) = g(F)/(rF). Now his a strictly decreasing function of its argument and so the 
inverse function, h}, exists. Thus, if we define 
2 = (1-4D/8]* (39) 


8, 
we can write P = Pr Ih- «F] = Pr e]. (40) 
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This quantity can be evaluated by direct numerical integration, using Pearson’s (1922) 
tables of the incomplete gamma function; the density function of S, can be expressed as 
a simple combination of elementary functions when r is odd and as an infinite sum of Poisson 
functions when r is even (see Fisher, 1928; Patnaik, 1949). When r is small these expressions 
can be quite easily evaluated. 


Table 6. The exact probability level of the confidence interval when the nominal level is 0-95 
r is the number of degrees of freedom for ‘treatments’, n — k for error. 


Fig. 1. The 95 0% approximate confidence interval (broken) and the upper and lower 97] 96 confidence 


limits (continuous) for At when r—4, n — k= œ. The long and short dotted line is the maximum 
likelihood estimate. It is assumed that G = 1. 
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It can be seen from the results in Table 6 that the interval is usually a little too conserva- 
tive, that is, when the nominal level is 95 %, the actual level is usually slightly greater than 
95%. However, it never exceeds 96 % and is usually between 95 and 95] %; this is probably 
sufficiently accurate for most practical purposes. The evaluation of the density function of 
the non-central chi-square distribution becomes rather tedious when r is greater than 7 and 
the table has not been extended beyond this point. It can be plausibly conjectured, however, 
that similar results would be obtained if it were extended. 

As was shown in §4, exact confidence limits can be placed on the non-centrality para- 
meter, A. It is interesting to see how these limits compare with the interval proposed in this 
paper when the number of degrees of freedom for error is infinite (when of course the problems 
of setting limits on ó and A are the same since g? is known). The 95 % interval and the upper 
and lower 97] % limits are shown in Fig. 1 for the case r = 4. It will be seen that the limits 
of the interval are always greater than the corresponding two-sided limits and also that, for 
small Si, the interval is wider than the distance between the two-sided limits. The curious 
kink in the upper limit of the interval just above the upper 5 % point of S, on the null hypo- 
thesis is due to the extreme rapidity with which the effective number of degrees of freedom 
of y, defined by (28), decreases near A = 0. Despite these facts, it seems to me that the 
confidence interval is to be preferred to the two-sided limit, since the upper limit of the 
latter is, for small Si, less than the maximum likelihood estimate of A; this seems to me to be 


most undesirable. 


I am indebted to Mr A. M. Walker for many valuable discussions. 
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THE EFFICIENCIES OF ALTERNATIVE ESTIMATORS FOR 
AN ASYMPTOTIC REGRESSION EQUATION 


By D. J. FINNEY 
University of Aberdeen and A.R.C. Unit of Statistics 


SUMMARY 


Methods for the estimation of the parameter p in equation (1) have been discussed, this 
equation representing the expectation of observations on a quantity y for a specified value 
of x. The methods all involve taking as the estimator the ratio of either two linear functions 
or two quadratic functions of the observed y. Their relative efficiencies and biases are con- 
sidered under two models for the generation of observations, the y either being independent 
and normally distributed about their expectations with constant variance or arising from 
a continuous autoregressive process in which the variance increases and successive values 
of y for an individual are correlated. 

Detailed investigation has been possible only for the very simple situation of four obser- 
vations equally spaced in . As is well known, the Patterson estimator, a ratio of two linear 
functions, is of high asymptotic efficiency for the first model, and it proves to be also highly 
efficient for the second. An interesting alternative is the calculation of a linear regression 
of y; ,, on y;. This is also highly efficient, and has the advantage of simultaneously estimating 
the parameter æ; moreover, under the second model, the estimators of p and c maximize 
the likelihood for any number of equally spaced observations. However, the estimator of 
p appears likely to have a considerable negative bias if the variance of observations about 
their expectation is at all large. Other quadratic estimators have been examined, but showed 
no special merits. 

The indications are that the Patterson estimator is always a fairly safe one to use, for 
any number of equally spaced observations; its efficiency is never low, and it is unlikely to 
be seriously biased. The estimator caleulated autoregressively is likely to be more efficient 
(in the narrow sense of having a smaller asymptotie variance), especially for the second 
model and for a larger number of observations, and if the variance per observation is low 
this advantage will not be offset by its greater expectation of bias. 


I. INTRODUCTION 


The regression equation N (1) 
expressing the dependence of y on an independent variate x in terms of parameters , f P 
has many uses in biology and biometry. For example, it has often been used to give an 
approximate graduation of the yield of a crop receiving an amount z of fertilizer (Crowther 
& Yates, 1941; Hodnett, 1956); again, during some phases of growth, the relation between à 
measure of size of an animal and time may be approximately of this form. The parameter 
p must satisfy 0 — p « 1. In practical applications, æ and 2 are usually positive with J E 
but this is neither essential nor a limitation on the discussion that follows. It is worth noting 
that a change in the origin of z simply changes the value of f. 
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In estimating the parameters of equation (1) from observations, p presents the chief 
problem. If p were known, æ and # would be estimated from a linear regression of y on p*; 
even if only an estimate of p is available, this calculation is a reasonable method of obtaining 
x and 2 though in general not the most efficient. For observations entirely unrestricted in 
respect of x, the estimation of all parameters simultaneously, or even of p alone, is neces- 
sarily laborious. A number of useful suggestions have been made, however, for the important 
special case of equal numbers of observations at each of several equally spaced values of z, 
with equal variance for all observations, Stevens (1951) showed a satisfactory routine for 
constructing and solving the maximum likelihood equations, and provided tables for 
assisting this process. Pimentel Gomes (1953) developed this technique further, Patterson 
(1956) proposed an ingenious and simple method of estimating p as the ratio of two linear 
functions, and showed it to be highly efficient at least for a moderate number of different 
values of z. Hartley (1948) suggested a very different procedure that he termed ‘internal 
regression’, The primary purpose of this paper is to study the relations between the Patter- 
son method and alternatives related to internal regression, and to discuss their relative 
efficiencies. Two modes of generation of error are also considered. Complexity of the algebra 
restricts the detailed discussion to the simplest case, namely, that of four observations 
equally spaced in. respect of x. 


2. 'THE CLASS OF ESTIMATORS 


For equally spaced observations, the values of x may be taken as i = 1, 2, 3, ...,n; the data 
for analysis will be equal numbers of replicate single observations at each value of i. The 
mean of the observations at a particular i, denoted by y;, will in general deviate from the 
expectation given by equation (1). In this paper, two models for the variance of y, will be 
considered. For both models, the variance for a particular i is supposed to be inversely 
proportional to the number of replicates, but one has a variance that is constant for all i 
and the other a variance that increases as i increases. Thus, in respect of replication for the 
same set of x values, each y; is a consistent estimator of Y;, where 


Y; = a- fp. (2) 

All estimators of p to be discussed are ratios of two functions of the y;. Each is of the form 

3 (3) 

where A and B, functions of the y;, are consistent estimators of two functions of p whose 

ratio is P5 that is to say, E(A) — E, E(B) =, (4) 
where, as replication at each value of i increases, 

g> Šo 7 No (5) 

and p = Šolo: (6) 


For some of the estimators considered, A and B are unbiased estimators of £, a No» 80 en 
&, / are equal to £, Jo respectively, whatever the replication; more generally, i 15 2 4 
terms involving the variances of the y; that tend to zero as the replication increase 2 g 
r is a consistent estimator of p, it is not in general unbiased even when £ U £p 9] s qe 

The estimators that form the subject of this paper pave A and Bin 15 ae ies 
quadratic functions of the y;, and will be referred to as linear’ or ‘quadratic es i 
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3. EXPECTATION AND VARIANCE OF A RATIO 


Write R= £y, (7) 
and define the bivariate moments of A and B by 
va = E{(A —£)*(B-7)}. (8) 
c 5 —1 
Then, by writing r=R(1+4) («n 


expanding in series and taking expectations, an asymptotic expression for the expectation 
of r is easily obtained 
E(r) = R 


vn — Rr. 422- Rvo big Ros 
3 a 
3 U 


deste: (9) 


An alternative expansion in terms of p begins 
E) = p+ 659—091) 119) (6-8) —p—") "nu un (10) 
f No No 10 
Equation (9) displays the bias in r regarded as an estimator of R, equation (10) the bias in 
r regarded as an estimator of p. 
A similar procedure leads to an asymptotic expansion for the variance of r: 


V(r) Er- EO) 


1 2 
= P (v — 2Rv + N02) — p (0g; — 2K + HRG) 
1 
M (3022 — 511) — 2:R(3v,5 — 04 vos) + E? (3994 — 520 + ---- (11) 


The first term of this is the expression usually quoted as the asymptotic variance for a ratio, 
Bers Vir) = V(A—RB)|n?, (12) 


and to the first order R may be replaced by p. 
If A and B are normally distributed quantities, all odd moments vanish; 


Vy — O0 if (s+) is odd 


and Ugg = 920 502 L 2071, Vig = 311 P02 Vos = Wie 
Hence B(r) = Ress (15e...) 0 
5 y? 
1 
and V(r) = 7 (vao — 2Rv,, + O2) + à (3099 vog + 5v1, — 16.Rv,, Voa + 82053) + <- (11) 


These expressions agree with those obtained by Merrill (1928). 


4. CONSTANT VARIANCE MODEL 


The simplest model that may be postulated for the errors to which the y; are subject is that 
of each y; being normally distributed about Y; with constant variance and zero correlation 
between observations. That is to say 


E((y; — Y;) (yj -Y;)) = c? if ni (13) 
=0 if 143. 


g 
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This is likely to be a reasonable approximation in some circumstances, notably when the 
y, are mean yields of a crop in a field experiment to compare different levels, i, of fertilizer 
application. It is the model discussed by Stevens (1951) and Pimentel Gomes (1953), who 
developed the maximum likelihood estimation equations. According to Patterson (1956), 
Lipton has extended Stevens's tables so as to cover the range 3 « n « 12. 

For the particular case of n = 4 important to this paper, the variance of the maximum 
likelihood estimator of p is 


=$ (G++ +o) a 
VO = a-p (1-39 f i 2p + ph)" qo 
where $? = atf. (15) 


5. PATTERSON'S LINEAR ESTIMATOR 


For comparison with subsequent sections, it is necessary to summarize here the results 
that Patterson obtained for the general linear estimator 


Ils V Hana e H- ? (16) 
FAY n-1 + aU nat 1/1 


n-1 


where Xn = . 
1 


Tp= 


This has the general form described in § 2, with expectations of numerator and denominator 
independent of c?, and therefore g = £j 7 = Jo. From equation (11) or (11) 


V ay P? HE + (ts Alla)? + (s — Pha)? +--+ Pa (17) 
(rp) = p oo apt? tmp" +.. Haa) 


Values of the p; for minimizing V(rp) can readily be obtained, though the general solution 
does not appear to be expressible very simply. Patterson has proved that the minimizing 
„ make rp equal to f, the maximum likelihood estimator of p, a result which is of limited 
value because the ji; are themselves functions of p and therefore an iterative calculation is 
required. 
For n = 4, the estimator may be written 
eee. (18) 
um 70 (A — 1) y A 


Then ECA) = fp — p)(p* 2), 
E(B) = fp — p)(p +A); 
which have the properties already mentioned. Also, by equation (11) or (17) 


29 (1+4) (1 pp)-AQ0 +e)” 15 

Vee) ge (+p? 09 

2473) ＋ 4p? +p? 20 

This is minimized by A= [3493 891 a (20) 


and substitution of this in equation (19) gives the same expression as in (14), in accordance 
with the theorem due to Patterson stated above. 


24 
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Patterson observed that use of A = 1-25 preserves a high efficiency of rp whatever the 
value of p; this value is fully efficient at p = 0-357. Equations (18), (19) give 
57 (21) 
62 2(21+p+21p*) 
PUp (x4 ` 


He tabulated the efficiency of this estimator. He also mentioned that, if p is large (say 
p> 0:6), the estimator with A = 1 has a smaller variance 


Ya— 2 


and V (rp, 125) = (22) 


fp17———, 23 
BAT. (23) 

e: 
er = pa =pp (+p) e 

similarly, if p is small (say p < 0:15), A = 2 is preferable 

_ Yat Va — 2s 25 
* Yat Ya =% E 

2 2(3+p+ 3p? 
Views) = ae, OP (20) 


p(1—p)? (2+p)? 

Tf the efficiency of these estimators were not high enough, the maximum likelihood 
estimator could be obtained by substitution of rp for p in equation (20) followed by iteration 
on this equation and equation (18). In practice, one cycle of iteration would almost certainly 
be adequate. 

Although in practice the biases in these estimators are not likely to be important, and 
their consistency ensures that they approach the population value, p, as the replication is 
increased, there is some interest in employing equation (10) to compare the magnitudes 
of the biases to the order of the term in G2. For the general linear estimator 


M 62  (A—-1?-290(4?—A 4-1) 27 
Bep eU, t Qr py 
With the three particular constant values of A that have been used in this section, (27) 
becomes ge 1442, 
Hir p, vas) = P+ Ni P (28) 
; p*(1— py (5 + 4p) 
nando d RM (29) 
Mee) = P+ ja - peta 
$^ lt6p (30) 


E(rp,2) = P+ Ta — py py 


It is worth noting that, although rp, is more precise than rp, 1.2, when p is very small, the 
bias is relatively greater (Table 2). 


6. TAYLOR'S ESTIMATOR 


Dr St C. S. Taylor has suggested to me that a very simple procedure for estimating p is to 
calculate a regression coefficient of y;,, on y; (i = 1,2, ...,n— 1) by the ordinary formulae 


of unweighted linear regression, neglecting any complications resulting from the occurrence - 
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of y, both as the ‘independent’ and as the ‘dependent’ variate. This is suggested by a 
recurrence relation derived from equation (2) 


Y, = a(1— p) - pY,, (31) 


which, without implying any optimal properties, gives some hope that useful estimates of 
a and of p might be obtained from the regression equation, 

Evidently the numerator and denominator of the usual expression for the regression 
coefficient of y;, on y, are quadratic functions of the y,. Moreover, since 


E(yy;)= Yi+o® if i M (2) 
-XX i iej 
the expectations of numerator and denominator are of the form specified in § 2. 
When x = 4, the estimator of p becomes 
M FP (33) 
E 20 — Usa — Va VÀ 01 VÀO 
for which it is easily verified that 
E(A) = 2/*p(Y - p (1 uiia (34) 
E(B) = 2f"p*(1—p)*(1+p +p*) + 60°. 


Evaluation of the first-order term in the asymptotic variance of ry, from equation (11), is 
a tedious piece of algebra. As both numerator and denominator are independent of a, the 


transformation 2; - &—yi (35) 


enables ry to be written as exactly the same function of the z;. Moreover, from elementary 
properties of the normal distribution, variances and covariances of quadratie functions of 


the z; can be evaluated as Viet) = 4ptotp +204, 
V (z;2;) = Pop + p*!) +04, 
O(23, ate) = otp, 
C3) =0, (36) 
C(23, 25%) =0, à 
C (2,25, ter) = fa? pi, 


C(2,25,252,) — 0, ete., 
for i4-j K. , i 
In forming equation (12), only the term of order ø? will be considered. By use of (36), 
this is fi 
is is found to be p Vip dpi ph 25 
Vite) = ape 230m-ppnt 


an expression very similar to (14), but slightly greater except at the limits of the range of p. 
By (10), the expectation of ry can be found as 

9. 144 %. (38) 

Bier) =P- pq - pare +e 


24-2 
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7. HARTLEY ESTIMATORS 


In a paper in which he introduced a method that he described as ‘internal least squares’, 
Hartley (1948) appeared to suggest a method of estimation analogous to Taylor’s but 
instead using a regression of (i-! on (%. In fact, he goes on to advocate some- 
thing rather more complicated, but before this is discussed the simpler alternative is worth 
examination. From equation (2) 


1 — 
N- = 20 -G Y). (39) 


Hence, computation of a regression equation of (= on (Yi41 + should give estimates 
of æ and p, and, a priori, this method of estimation might be expected to be similar in 
character to Taylor’s. 

The numerator and denominator of the regression coefficient, formed uncritically in the 
ordinary manner, are quadratic functions of the y;; the ratio of the difference between these 
quantities to their sum is an estimator of p in the sense of § 2. The properties of the estimator 
can be studied exactly as in $6. For n = 4, the estimator is 


ns 294 — 3Y Y2— Vas + VÀ — Usa — Va + Yt YV, (40) 
274½ — Jaja — Vai T Y3—YsYo— 3 ½ + Yat YÄ 
Then E(A) = 2f2p*(1 — p? (1 - p) (1- p- p?) 2 P 
E(B) = 2*p*(Y — p)? (1 +p) (1 +p +p”) + 40°. 


By the use of (35), (36), the variance is evaluated as 


d? 3+4p+4p+4p°+3p* (42) 
P-P U Te 


V(r,) = 


exactly the same as for ryp. 
The estimate is not identical with 72, and it has a bias of opposite sign; by (10), the 
expectation of r, can be evaluated as 


me n 2+p+ Tp? + bp? + 3p* (43) 
P* RO — py A phy | 


The method that Hartley in fact advocated was not based so directly on (39), but involved 
a more complicated regression of the y; on partial sums of y;. I have been unable to under- 
stand what particular advantage can be claimed for this procedure on the evidence that 
Hartley supplies. Viewed as a regression calculation, it is considerably more laborious than 
either of those considered above, because of the various partial sums that must be formed. 
So far as its precision is concerned, the construction by way of a regression is irrelevant, 
because it takes no account of the pattern of the errors; the estimator is in reality just om 
more ratio of two quadratics, and its merit can be judged only by starting from this point. 

For n = 4, Hartley’s formulae can be arranged to give 


E(r,) 


g = BAA VAYs— YY — YV FY DS de 21 + ei. (44) 
3j — h — Jas 293 2½½ — 9a + J Vo I 
For this ratio 
E(A) = PU VH (45) 


E(B) = f^p*(Y — p)? (1 +p) (3+ 4p + 3p?) + 60°. 
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The variance is 
a $ AT+12p + 1292 p 1299 + 7p") 
Mra) = (Sx4prd3p) (46) 
The expectation of the estimator is 


E(rg) =p F. — X3404 20p* + 160 794) (47) 


+ „. —. — 


Pü-p* eee 
8. GENERAL QUADRATIC ESTIMATOR 
The analysis of the preceding two sections suggests that the most general quadratic estimator 


of the form 
* Ta = ND digi; (48) 
íi [y] 
that falls within the class of estimators of § 2 might usefully be studied. The conditions that 
the expectations of numerator and denominator of r shall be in the ratio p, except for 
multiples of c? in each, place a number of linear constraints on the coefficients that enable 


them all to be expressed in terms of five arbitrary quantities. These expressions have not 
been derived. By imposing the additional conditions that 


UP =0, Ld =0, 
the terms in o in numerator and denominator are eliminated, and therefore 
p = E(A)|E(B). (49) 


Under these more stringent conditions, only three arbitrary quantities remain; the coeffi- 
cients in (48) may be written 


Coefficient of Numerator Denominator 

y 0 27 

71 2u, — 4 2j 

Ys —3u,2u,— Hat 2h 2% — 3½ .. 2½ — Ha 
Ys 1— 2½ . lia 24 — 2+ r 2½ U. Me 
5 -h 4 24. -m th 
Yas 11 — 2½ ＋. Mat 2a Su, Ha 2½ L. Ha 
Yas — pat 2½ — 3½ 21 2711 — pa 205—394 
ys 11 — Hs „. 
Ysa 213 — 4j, n, 
9 2p, 0 


where the ratios ju, : Jis : Jig: 44 are arbitrary. Then 
E(A) = 2p*(1 — p Qt, tsp lia. NI (60) 
E(B) = 2p*(1 — p)! (a tsp + HP +p’). 
Heavy algebra leads to the asymptotic variance formula 
2 D Cij kiki 


geo DEI > (51) 
Vira) = 5 — py? 2a + Ma ap? ap 
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where Cy, = 7 220 + 42p? + 420? + 33p* + 1205 + 4p*, 
Ca = — (4+ 13) + 20p? + 209? + 14p* + 505 + 2p*), 
cis = 2(1+ 4p + 8p? + 11 p? + 10p* + 505 + 2p*), 
Cy, = —2(2+ 9p + 17p? + 23p + 17p* + 905 + 209), 
Cop = 4+ 8p + 11p?+ 10p? + 6p* + 205 + pP, 
Cas = —2(1+ 2p + 4p? + 5p + 4p* + 2p? + p*), 
Coq = 202 ＋ 5p + 10p? + 11p* + 8p* + 4 + p*), 
Cag = 1+ 9p + 6p? + 10p? + 11p* -- 8p? + 46, 
Cz = — (2+ 5p + 142 + 20p? + 20p* + 1305 + 4p), 
Cya = 4+ 12p + 33p? + 420? + 4204 + 2205 + 706. 
'The next step might appear to be the minimization of this variance by suitable choice of 
the u; Not only would this involve very laborious algebra, but the result, the ji; expressed 
as complicated functions of p, would be of limited practical interest. An alternative is first 
to try to constrain the j; so that (51) agrees with the known minimal values as p approaches 
0 or 1, namely 3d? g 
as may be seen from (14). This is achieved by taking 


respectively, 


Ha=0, ½ = 201 — Ula). 
Minimization of (51) is now readily shown to require 


Ha _ 4+ 9p + 18p? + 17p? + 12p* + 4p? (52) 
fy 1 4% ＋ 7p? +10p? + 6p* 4 


the value of which declines monotonically from 4 at p = 0 to 2 at p = 1. At p = 0-5, (52) 
bec 

LV 13½ = 3215, 
and evidently the value is nearly constant when p is in the upper part of its range. Hence, 
with the condition of being optimal at the extremes of p, the simple values 


1 = ld, „ = 0, e = 2, p, — 2 (53) 
seem likely to be fairly good throughout the range. 
These give 
— Ay 4½ % — 3½½ — Ya Y1 — . Tao — Va — SYI t WY (54) - 


4y4Ja — 4491 — 4½ As + 2 — MI 2A 
The expectations of numerator and denominator can be read by inserting (53) in (50). 
and from (51) | 
C (55) 
Uy pup 2014 2p?+ 2p8)? 
The estimator rg is certainly biased, but the magnitude of the bias has not been examined. 
Unfortunately, neither the general form of the quadratic estimator for n = 4, nor the 


study of optimal conditions, discloses any hint of appropriate further generalizations to | 
higher values of n. 
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9. INCREASING VARIANCE MODEL 

When equation (1) represents a biological growth curve, x is a measure of time, and it may 
be appropriate to incorporate into the model a variance that changes with time. One way 
of doing this is by means of a continuous autoregressive scheme in which the expected rate 
of growth at any time is given by a differential equation corresponding to (1), but in each 
element of time individuals are subject to an error distribution. This will produce a corre- 
lation between successive observations in respect of their deviations from the average 
curve as well as a steadily increasing total variance. The theory that follows in this section 
has analogies with the theory of Brownian movement, based upon Langevin's ee 
(Chandrasekhar, 1943). 


It is useful to write logp = =y (56) 
and =u. (57) 
- : S oY 
Then, from equation (1), 8 y(a— Y). (58) 


Define K,(0,z) as the cumulant generating function of y for a particular value of the 
independent variate, x. Define also L(, z) d as the cumulant generating function for the 
distribution of the additional ‘error’ acquired by an individual in the time interval (x, + dz). 

Express the condition that the cumulant generating function at time (x d) is the sum 
of the functions at x for the variate (y+dy) and for the error, or 


K,(0,x+dx) = Ky,q,(0,2) + L(0, x) de (59) 
where dy = y(a—y) dz. 
Then K, ) + dy A Kad U- YaD. x) + L(0, x) dx 
oK 
= s (10) + K c. clio de 57, 140, 2) dz. 
mU -g ratio) 140,2) = 0 (60) 


Moreover, if time is measured from the point at which the error begins to take effect, this 
differential equation is subject to the end condition 


K (0,0) = (x— £) (ib). (61) 
(i) Suppose now that the error increment is normally distributed and a constant in- 
dependent of y; that is to say L(0,2) = 3o*(i0)*. (62) 


The solution of (60) subject to (61) is, as might almost be guessed, 
9 (i0)? 
K,0,2) = (a- pu) di) Z Ul.) oy (63) 
Thus y is normally distributed, with expectation still given by equation (1) but with 
variance now expressed by * 


V(y) = 770 Pet), 


and tending to the limit g?/2y as & becomes large. 
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(ii) A second possibility is to have a normally distributed error increment whose magni- 
tude is proportional to the expectation of y. If x,(w) is written for the sth cumulant, then 


L(0,2) = N] (ih. (65) 
Equating coefficients of iĝ in (60) then gives 
UK; (U) — ky(u) +a = 0, 


ux(u)— 2c + (0?/y) Ky(u) = 0, 


uk,(u)—sk,(u)=0 for 383 


whence, in virtue of the end condition (61), 


8 2 10 2 
K. 0, 2) = (a— fu) (ia) + = f+ (a 2) EE. (66) 
Thus the expectation of y is the same, but now the variance is 
2 
Vi) = 55 (1-0) fa («- 3p (67) 


(iii) The hypothesis that the increment in variance is proportional to the square of the 
expectation of y does not lead to a solution. 

(iv) Yet one more possibility worth mentioning is that of a normally distributed variance 
increment that decreases in proportion to the amount by which E(y) falls belowits asymptote. 


This requires Lob, x) = ge- iC (09)? E 
(i0)? 


a (69) 


2 
which gives rise to KO, x) = (x — Bu) (i0) PLE —u) 
Only the first of these models will be used further in this paper. Equation (64) for the 
variance can easily be generalized to show the covariance of the y values at times 24, Yo 
which is o 
Clays) = 2 rell - pr), where neas (70) 


If observations are made at unit intervals of x, a transformation of values of y greatly 
simplifies the algebra needed in the study of the estimators of p. Suppose that at times 
X,X+1,X+2,... the observations are 4;, Y2 Yz.. Write 


4=Yp 


} (71) 
zi = Yı—PYi- for i21. 
Then it is easily verified that 
E(a) = «—fp*, } (72) 
E(z,)=a(1—p) for i>1, 
d also that > 
Pria Ve) = - (1 - 9*9), | 
y 
: (13) 


g? : 
Vs) - for e 


C(z,2,) —0 for j>i>0. 


i D. J. FixvEY 381 


10. PATTERSON'S LINEAR ESTIMATOR 


Now consider once more the general linear estimator of p, equation (16), for a set of n 
observations made at times X, X -- 1,..., X +n—1, Then 


A —pB = iy, + hin-i +- Hae 
Write, here and subsequently, 
M - (74) 
yp 


From (11), the asymptotic variance of the estimator is 


y*1-p) Mitut.. tah 
ie p (3 9" * + ug p^73 + -+ y a) (78) 


This expression is minimized with respect to the ji, by taking 


E (76) 


so that the minimal variance attainable by a linear estimator is 


n-2 2 
E E xp 


Moreover, as will be shown in $11, the variance of the optimal linear estimator tends to 
equality with that of the maximum likelihood estimator, and the estimator is therefore 


fully efficient. ] , 1 
For n = 4, the estimator may be written in the form of equation (18), and the variance is 


Vee) = aA te 
The variance is minimized by fi 
33 79) 
AA 1+2p’ ( 
3y?(1 +p) 
which leads to Vois ) = xir y . (80) 


Evidently A — 1:25 is again a good approximation at moderate values of p, corresponding 
exactly to the optimal at p = 0-5 instead of at p = 0:357; for large p, A = 1 is superior and 
for small p, A = 2 is superior. The variances for the estimators (21), (23), (25) are 


* 42(1+p) 81 
VG. as) = pX( — p) (5+ 4p)?’ m 


y 82 
V(rg) = pX( — p) Ih P 
6(1-- p) à (83) 


V(rp,2) = px(1—p) [E 
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The expectation of the general linear estimator, corresponding to equation (27) for the 
constant variance model, may be written (to the first-order terms in the bias) 


y? (1+ p){(A=1)?-+Ap} 


E(rp) =p (I —p) (A+p)? (84) 
For the three values of A of particular interest 
m y* — (1xp)(1- 20p) 
E(rp, vas) = P+ RX(1— p) 5 (85) 
2 
E(rp) = p+ ax A (86) 
Era) = Pax dass (87) 


p (ID (24p* 


11. TAYLOR'S ESTIMATOR 


The form of equation (31), taken in conjunction with the properties of z; stated as equa- 
tions (71), (72), (73), indicates that Taylor’s autoregressive calculations correspond to 
maximum likelihood estimation for p, and indeed for p and a simultaneously. In fact, all 
the information on p contained in the observations is comprised in the statement that the 
z; (i = 2,3,...,n) are normally distributed with equal expectations and equal variances 
and are uncorrelated. The magnitude of their expectation involves another parameter, a, 
and minimization of the sum of squares of the z; (i> 1) from their mean is therefore the same 
as maximum likelihood estimation of p, «. This calculation is immediately seen to be the 
same as that for Taylor's caleulation of an unweighted linear regression of y;,4 OD Yis and 
therefore 

fp f (88) 


for every value of n. The maximum likelihood estimate of V is then obtainable by equating 
21 to its expectation, and g?/2y can be estimated from the variances of the 2;. 
At first sight, this appears to give the variance of rp as 


aà — (2p) 
9yn-1 Xy, VP 
Y x (un 


the ordinary formula for the variance of a linear regression coefficient when the individual 
residuals have the variance g?(1 — p?)/(2y) shown in (73). However, this needs to be modified | 
here, since the y; are themselves the observations, and in order to obtain the asymptotio 
variance of ry the y; must be replaced by their expectations. The result is obviously identical 
with (77), whence it follows that the optimal linear estimator is fully efficient. 

For n = 4, the variance of rp is thus also given by (80), a result that may be verified by | 
the alternative laborious algebraic process used in $ 12. The expectation of the estimator 18 


| 9* (1-9) 4p p?) (89) 
pXQu-p) N ^ 


Err) = p 
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12. HARTLEY ESTIMATORS 


Here also the estimator obtained from consideration of the regression of (% on 
(Ven merits examination. For n = 4, the formula for this estimator, r}, is equation 
(40). Under the conditions of the increasing variance model, the numerator and de- 


nominator of r} can be expressed in terms of the z; and their expectations obtained by the 
use of (72), (73) as 


E(A) = 2P PEHA — p?) (1 — p?) + (e*/2y) (1 — p°) [4 — p + 3p* + 2p(1 — p?) (1—p**)), 
E(B) = 2f%p**(1—p*) (1 — p?) + (o4/2y) (1 — p*) [2 + p + 29? + (1 — p?) (1 — p*X)]. 


Again, of course, numerator and denominator are of the form specified in § 2. 

Now (A —pB) can be written as a quadratic function of the z,, the coefficients being 
polynomials in p. The variances and covariances of all the 2} and z,z, can be formed from 
(72), (73) and simple properties of the normal distribution. Hence, the asymptotic variance 
of r,, formula (12), can be evaluated. To the first order 


310 +p) 
V m LI 91 ) 
(01) 7 agi (1 — pA) ! 
which is identical with (80), and therefore r, is of full efficiency for n = 4; whether it shares 
with ryp the property of full efficiency for all n is not known. 
The same procedure can be used to give the asymptotic variance of y, defined by equation 
(44), the estimator that Hartley’s procedure gives for n = 4. The result is 


2y*(1 +p) (7+ 11p + 7p) (92) 
p?X(1— p) (3+4p + 3p?" 


The biases in r, and rj under the increasing variance model have not been investigated, as 
there appeared to be no evidence that these estimators possess any special interest. 


| (90) 


V(rg) = 


13. COMPARISON OF ESTIMATORS 
All the variances relating to the constant variance model that were obtained in $$5-9 


are multiples of 3 

501 
The efficiencies of the estimators can therefore easily be compared in terms of the multipliers 
of this quantity that occur in the estimator. Table 1 presents these multipliers for 
different p. ? 

Formulae have been obtained earlier for the biases in these estimators, to the order of 
the terms in 9. These biases contain the same factor as do the variances, and the remaining 
factor is tabulated in Table 2. ‘ ; X25 

Judged by asymptotic variance, Patterson’s estimator with A = 1-25 compares hl wi 
the maximum likelihood estimator, except that the loss of efficiency approaches 12 9 when 
p is small. If it were practicable to use A = 2 instead of A = 1:25 mne p was less than 
0-15, the maximum loss of efficiency could be kept down to about 3%. The Taylor estimator, 
rp, however, succeeds even better, and without any necessity for some prior knowledge 
of p to guide the choice between alternative formulae; the loss of efficiency, never exceeds 
1%, whatever the value of p, and becomes negligibly small at either extreme of the range 
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opposite sign. 


Table 1 may suggest that r, is as good as ry, in that it has the same asymptotic vi 


Efficiencies of alternative estimators 
of p. The bias is of the same order of magnitude as for Patterson's estimators, 


Doubtless the two are members of a class all having the same variance function. Ho 


Table 1. Multipliers of d?/{p*(1 —p)*} in the asymptotic variances 


of various estimators for the constant variance model 


p 
f 
0-0 1-5000 
0-1 1-3945 
02 1-2921 
03 1-2048 
04 | 1-1360 
0-5 1-0849 
0-6 1-0487 
0-7 1-0246 
0-8 1-0098 
0-9 1-0022 
ro 1-0000 


Table 2. Multipliers of $?/{p2(1—p)"} in the term of order ꝙ for the bias in various esti 
under the constant variance model (a positive bias indicates that the expectation. d jj. 


estimator exceeds p) 


Tp,125 


0-0400 


0:1 0:1783 
0-2 0-2794 
03 0:3538 
0:4 0-4086 
0-5 0-4490 
0-6 0-4785 
07 0-4997 
0-8 0:5146 
0-9 0-5246 

5 0-5309 


Tp, 125 


1-6800 


1-4616 
1-3103 
1-2066 
1-1368 
1-0918 


1-0650 
1-0516 
1-0482 
1-0522 
1-0617 


Estimator 
Tpa Tp: er and r, ty 
pu 

2-0000 1-5000 1-5000 1-5556 1-5000 
16094 | 14195 | 13977 | 14165 | 14059 | 
14444 | 1-3719 | 12997 | 1-2999 | 13146 
1-2899 | 1-3497 | 1-2143 | 1-2072 | 1-2263 
11837 | 1-3472 | 1-1450 | 1-1367 | 1-1501 
11111 | 1-3600 | 1-0918 | 1-0851 | 1-0918 | 

g 
1.0625 | 13846 | 10533 | 10488 | 10514 | - 
1.0311 | 1-4184 | 10271 | 1-0246 | 1-0253 | 
1.0123 | 1-4592 | 1-0109 | 1-0098 | 1:0099 
1.0028 | 1-5054 | 1.0025 | 1-0022 | 1-0022 | 
1:0000 | 1-5556 | 1-0000 | 1-0000 | 10000 | 


T 


Tp,2 


0-2500 


0-3628 
0-4545 
0:5293 
0-5903 
0-6400 


0:6805 
0:7133 
0-7398 
0:7610 
0:7778 


Tr 


Th 


— 0-5000 


— 0-5722 
— 0-5983 
— 0-5926 
— 0-5671 
— 0:5306 


— 0-4894 
— 0:4472 
— 0-4065 
— 0-3683 


1-0000 


0-8025 
0:6842 
0-6150 
0:5748 
0-5510 


0-5360 
0-5253 
0:5164 
0:5082 


0-6667 


0:5509 
0:5140 
0-4996 
0:4984 
0-5016 


0:5050 
0-5068 
0:5066 
0:5042 
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ry has the merit of being directly computable from the data by a standard regression peo- 
cedure, whereas r, requires the preliminary formation of (y,. , — y) and (ys, +y) and also 
the final calculation of r, from an estimator of (1 —p)/(1 +p). Moreover, the bias in ra b 
substantially greater. The Hartley estimator, 7%, has an asymptotic variance that is some- 
w hat larger than that of ry for small values of p (p < 0-2), but for higher values approximates 
to V(r) even more closely than does rr. At best, however, the gain is small and scarcely repays 
the more cumbersome calculations needed for the Hartley estimator. The estimator rg is 
also of high efficiency, but has no special merits, and its lack of any obvious generalization 
to higher values of n makes it, for the present at least, of little interest. 

For the increasing variance model, all the estimators discussed in &j 10-12 have asymptotic 


variances that are multiples of 
prp 


Table 3 shows the multipliers, and so enables the efficiencies to be compared. 


Table 3. Multipliers of y*/{p**(1—p)} in the asymptotic variance 
of various estimators for the increasing variance model 


Formulae for biases have not been obtained for r, and ry; for the other estimators, the 
biases are tabulated in Table 4, again as multipliers of the factor that occurs inthe erae 

Perhaps the most surprising feature of Table 3 is that, despite the very xvn ih 
used, the general pattern is very similar to that of Table 1. The eee arat 5 
A = 1-25 is good above p = 0-2; if it were practicable to use instead A = both yt 
the maximum loss of efficiency would again be about 3%. Now, sod a Ure a 
are of full efficiency (though not identical with one another ); Ty hasa pe Senay 
approaches 4% for small p but becomes vanishingly small if p appr 5 c E 

All that has been said so far in this section relates only ian nde de bare been 
estimators have the advantage that, for the constant variance poet E E ue 
studied for values of n up to 12; although their efficiency ae eo 
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declines as n increases, it does not become intolerably bad, and there is no di 
forming an expression for the asymptotic variance. A similar study for higher ¥ 
could be made fairly easily for the increasing variance model. í 

On the other hand, the Taylor estimator has been shown to be not merely of higheffie 
but actually identical with the maximum likelihood estimator for the increasing va 
model. Taken in conjunction with the close similarity between the two models in res] 
the relative efficiencies of different estimators when n = 4, and in particular 
efficiency of ry for the constant variance model, it seems reasonable to conclude th 
will be an estimator of high efficiency for this model for all n. It may well continue 


Table 4. Multipliers of */(p** (1 —p)} in the term of order y* for the bias in various 
under the increasing variance model (a positive bias indicates that the expect 
estimator exceeds p) 


f 


Estimator 
P 1 
7p. 1 TPA TP. | 
0-0 0-0400 0-0000 0-2500 
0-1 0-1132 0-0909 0-2993 
0-2 0:1784 0-1667 0:3471 
0-3 0-2367 0-2308 0-3932 
0-4 0-2893 0-2857 0-4375 
0-5 0-3367 0-3333 0-4800 | 
0-6 0-3798 0-3750 0-5207 
0-7 0-4191 0-4118 0:5597 | 
0-8 0-4551 0-4444 0-5969 | 
0.9 0-4881 0-4737 0.6326 
1-0 0:5185 0-5000 | 0.6667 


the property of being more efficient than the Patterson estimator over almost the " 
range of p.* Mathematical demonstration of this would be tedious for n — 5 and exce siv 
laborious for higher, unless a more general method than that of this paper is found. Ta 
method has a further advantage over Patterson's, in that the calculation of the re; 

of % on y; simultaneously provides an estimate of a(1— p), and therefore of a, 
Patterson's calculation of r as a ratio of two linear functions must be followed by à 
calculation of the linear regression of y on 7? in order to obtain estimates of a 
equation (1) represents a growth curve, & may be of substantially greater practical 
than £, since it represents the final size attained whereas £ depends largely on the 
which observations begin. Equation (39) shows that the regression calculations for 
equally simply to simultaneous estimation of a, but, as already noted, , does not | 
to have any advantages over rp. Hartley's method, indeed, gives simultaneous e 
of the three parameters of equation (1), but the calculations are more laborious 


* [See, however, the following paper by Mr Patterson.—Editor.] 
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the other cotimators and there seems to be no reason to suppose thet ry will sequire for 
larger values of n any of the superiority relative tor, amd v, that it evidentiy Eu- for a LI 

Ihe Taylor method, however, may be open to objection in respect of biss. It is well known 
that i! the independent variate in ordinary linear regression cabeulstions is subject to erron 
of measurement, the regression coefficient will tend to be underestimated. Dr F. Yates 
hae pointed out to me that one might expeet ry to be negatively bias] on this aecognt 
the fact that, in Tables 2 and 4, ry is tbe only estimator to display à negativo bias is in 
conformity with this, The proportionality of the bias to Ø or p? indicates that ita relative 
importance would decrease if the replication at each value of x were increased and a? 
therefore decreased; however, if resources for additional replication were utilized to permit 
a narrower spacing of the z values at which y is measured, instead of additional y meseure- 
ments at each x, the bias need not diminish, and, indeed, Yates has suggested reasons why 
it might even become relatively more important. If observations were made at intervals 
of 0:5 in x instead of at unit intervals, for example, the Taylor calculations would lead to 
estimation of pl instead of p, and thus a bias of the same magnitude as before would corre: 
spond to a proportionately greater bias in p. (The same argument leads to expectation of 
a positive bias in r,.) When ¢* or p* is very «mall, the bias is almost certainly too small to 
matter; for example, under carefully controlled conditions, Taylor (personal communication) 
has found successive observations on a particular measurement in the same animal to have 
9? of the order of 0-001 or even 0-0001 under the constant variance model, When the ob- 
servations are such that successive values of y must relate to different animals or different 
field plots, as in fertilizer response curves, ó* might be very much greater and the bias 
correspondingly more serious. 

The conclusions to be drawn are that, for the estimation of p, both Patterson's linear 
estimation process and Taylor's are of high asymptotic efficiency, For the constant variance 
model, if it is essential to have a method for which the asymptotic variance is known, then 
Patterson's should be chosen pending further research on Taylor's; if attainment of high 
precision is more important than knowing that precision, Taylor's method may sometimes 
be preferable. For the increasing variance model, there seems to be no doubt about the 
superiority of Taylor's method in respect of the variance of p, as it gives the maximum 
likelihood estimator and the variance can be computed by substitution of ry for p in 
equation (77). On either model, Taylor's method will not be seriously biased when ¢* or 
y? is very small, but it should be adopted with great caution if g or y is large because of 
the possible danger of a large negative bias. 

Some monthe al OB paper was completed, Mr H. D. Patterson allowed me to see the 
typescript of a paper (Patterson, 1958) in which he has developed a more pud sux 
on the same group of problems. Although his analysis confirms that, for the constant 
variance model, the estimator rz is of high efficiency for n = 5, 6 and 7, he disproves the 
speculation above that it might still be more efficient than the linear estimators he had 
previously proposed. The advantage remains with ry at extreme values of p, but for 
0-3 « p « 0-7 the linear estimator may have an appreciably smaller variance. Still more 
disturbing is the size of the bias that attaches to ry for larger n; for n = 7, the bias may be 
ten times as great relative to the variance as it is for n = 4. This fact places a : me 
tion on any circumstances in which rz might advisably be used. 3 | 
model, the position has yet to be fully explored, and despite the full 1 
biases may so increase for larger n as to make the method seldom trustworthy, Moreover, 
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as Patterson rightly emphasizes, in practice observational data will usually be subject to 
random errors as well as to any arising from an autoregressive scheme, so that the truth may 
lie somewhere between the two mathematical models. 


I am indebted to Dr St C. S. Taylor, of the Animal Breeding Research Organization, for 
arousing my interest in this problem and for stimulating discussions upon it, to Dr F. Yates 
for permission to use his comments on bias, and to Mr H. D. Patterson for showing me the 
typescript of his new paper. 
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THE USE OF AUTOREGRESSION IN FITTING AN 
EXPONENTIAL CURVE 


Bv H. D. PATTERSON 
Rothamsted Experimental Station, Harpenden, Herts 


INTRODUCTION 
The exponential regression curve 


S Hp, where 0<p<1 (1) 


is being increasingly used in statistical work in many branches of biology. Various numerical 
methods of fitting this curve have been suggested in recent years for cases in which the y’s 
are independent and have equal variances and the z's take the n values 0, 1, 2,...,n—1. 
Stevens (1951) described a fully efficient least squares method for estimating a, 2 and p, 
and the errors of these estimates. The method is one of successive approximation and 
requires a reasonably accurate initial estimate of p. It works well on an electronic computer. 
He also provided tables for n = 5, 6 and 7, since extended by S. Lipton, which are very 
useful when the computations are carried out on a desk machine. Pimentel Gomes (1953) 
provided further tables for n = 4 and n = 5 which permit the least squares estimates of 
p to be obtained more rapidly than by the iterative procedure. 
In a previous paper I pointed out that the least squares estimates of p, ? say, are of the 
form 221 
Ew). 
£2, (2) 
> wy(f) Vrai 
where the w,(#) are complicated functions of f (Patterson, 1956). I there suggested that 


replacing r by p, in the w,(#) would give estimates, r(po), say, where 


n—-1 
E wPo) Yz 
70000 . 
w,(Po) Joi 


3) 


Which are almost fully efficient over a range of values of p around py. In fact it was found 
that foreach of the cases n = 4, 5, 6and 7 a suitable choice of p, leads to very simple estimates 
of reasonably high efficiency over the useful range of p. 


The main uses of this method are: 
(a) to provide initial estimates of p for the method of Stevens (1951); 


i f p. 
(P) to enable rapid checks to be made on assumed values o 
Neither of these applications requires the estimation of æ and 2. The method is also useful, 


in the cases mentioned above, for a complete curve fitting when computing facilities 5 

limited and full efficiency is not required. Satisfactory estimates of a and fi oan be EREM 

by simple regression of y on , provided that r is a reasonably efficient and unbi 

estimate of p. 
25 
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3 


Since, however, the range of p efficiently covered by r(p,) decreases with increasing n | 
other simple methods require to be considered. In one very simple and at first sight attractive 
method p is estimated by the regression of y, ,, on y, in the equation 


Yori = &(1—p)+pyz- (4) 

This estimate is one of a family of estimates obtained by a procedure which Hartley (1948) 
has termed internal regression. Hartley himself considered a regression equation involving 
x and certain partial sums of the y,. This regression equation is rather more difficult to 
handle than (4). 

Recently Finney (1958) has made a detailed comparison of various estimates of p for the 
case n = 4, These estimates were: 

(a) the simple regression estimate given by equation (4) (suggested to him by St C. 8. 
Taylor); 

(b) a similar estimate obtained from the regression equation 


2 
Y Yz = 20 J I+ 


(c) the estimate proposed by Hartley (1948); 

(d) the estimates proposed by Patterson (1956); 

(e) an estimate given by the ratio of two quadratic functions of the y,, but which is not 
expressible as a regression. 

Finney found that the estimate (a) gave a good performance for n = 4. He suggested 
that it might well continue to have high efficiency for all n, but quoted comments by 
F. Yates to the effect that bias in the estimate can be expected to become increasingly 
important as n increases. 


Finney (1958) considered two models for the errors of the y,, the constant variance model 
and a model in which the quantities 


: rdi = Yori —PYz (6) 
are subject to errors which are independent and have equal variance. For this latter model, 
estimate (a) is the maximum likelihood estimate of p. 

In the course of the work already referred to I made a study of the behaviour of estimate 
(a) under the constant variance model for values of n 4. This study leads to somewhat 
different conclusions than can be drawn from the single case » — 4. A report on the work 
appears therefore to be called for; this is the main purpose of the present paper. Although 
only estimate (a) will be considered in detail the theoretical results developed below can be 
applied to a wide range of estimates including all those mentioned above. 
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(Yor ku yz): (5) 


QUADRATIC ESTIMATES OF p 


The estimate of p given by the simple regression of y,.,., on y, in equation (4) can be expressed 
as 


n-1 = 
8 A — » 0 (1) 
. 
> 0 l 
n-i 8 f 
> * ( ! 
where w = yx NS 
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i,e. in the same form as equation (3) but with w,, a function of u., replacing w,, a function 
of po. Estimates of this type, given by ratios of quadratic functions of the y,, can be referred 
to as ‘quadratic’ estimates in order to distinguish them from ‘linear’ estimates of the 
type (3). 

It is convenient to express A and B in matrix notation as follows: 


A = yw, = Y; DoYo, (9) 
B = Yg Wo = Y6 DoYo (10) 
where Yi = (V1Y2--- 1-1). Yo = (VoM -u). 


Wo = (t, tw... 105, 4) 


and D, is a matrix oforder (n — 1) x (n— 1) with each diagonal element equal to (n — 2)/(n — 1) 
and each non-diagonal element equal to — l/(n— 1). 
More generally we can consider the regression equation 


You = T pep Ute en (11) 


where k + 0. The regression coefficient is 
yiDo( ey, +ly,) 


so that the estimate of p is now 
_ YiDo(kyo + lyi) 12 

n YoDo(Eyo t ly,)' a2) 
The significance of the 0 in the new notation for the estimate of p will be explained later; 
k and l are self-explanatory. Thus the r defined by equation (7) is now denoted by r(0, 1, 0). 

It can also be shown that r(0, k,l) is the estimate of p obtained from the regression of 
Ky, E Uy, on ky,+ly,,1, where k’ and “ take any values such that Al’ + k'l. Thus, for 
example, the estimate obtained by Finney (1958) from the regression equation (5) is 
identical with the estimate obtained from the regression of either y, or y, ., on y, + rn. 

A general form for quadratic estimates of p is suggested by the above. This is obtained 
by replacing D, in equation (12) by D, any non-zero matrix of order n— 1 x n —1 with 
elements in columns summing to zero. The restriction on the column totals of D ensures 
that terms in æ and / are eliminated from the estimate of p. : 

If D is symmetrical it is usually most convenient to calculate the estimate from a regres- 
sion equation obtained by a transformation of (11). i i P 

An important class of estimates, using symmetrical matrices ofa special type, and having 
the property of asymptotic efficiency for some value of p will be considered in a later section. 
These estimates include r(0, k,l) considered above and the estimate suggested by Hartley 
(1948). : E l 

An example of an estimate of p using an asymmetrical matrix D is provided A M 
estimate rg derived by Finney (1958) for n = 4. This can be obtained by parang me 


| — —1 and 
2 -1 4 
D 1 0]. 
1 0 —4 
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EXPECTATION AND VARIANCE OF THE GENERAL QUADRATIC ESTIMATE 


The method adopted to determine the expectations and variances of quadratic estimates 
of p is in principle exactly the same as that used by Finney (1958), but the algebra has been 
developed so as to give more general results and to permit arithmetical operations to be 
carried out in a systematic manner. The results given in this section apply to any quadratie 
estimate of the type considered in the previous section; the special features of the estimates 
r(0, k,l) will be dealt with later. The errors in the y, are supposed to be independently and 
normally distributed with variance c. 

The following formulae for the asymptotic variance and expectation of r = A/B will 


Deed i var A + h var B- 2p cov (A, B) 


EBF i en 


varr 


V,—pVg „ var B—cov (A, B) 


6(r)=P+ "exp CEF ' 


(14) 


where V, and V; are the terms in c? in CA) and &(B), respectively, and &*(B) = (B) —Vg. 
These expressions are suitable when o? is small relative to B. They have been discussed in 
detail by Finney (1958) and no further comment is required here. 

The variances and covariances of A and B can be obtained by repeated use of the formula 


cov (s Dt, u'/Ev) = &(s’) DC, E'é (u) 
4-é(s') DC,,Eé(v) + &(t’) DC E'c(u) 
4- (t^) D'C,, Ec (V). (15) 
Here s, t, u, v are jointly normally distributed variates with covariance matrices 
COV (fi, v1) COV (bV) -= 
Cy, = | cov (ta, 1) co (ta, va) 7 ; (16) 
COV (ta, 21) cov (tg, vs) 
etc. In addition the expression 


Dt) = &(s’) DE (t) + trace (DC) (17) 
is also required. 


It is convenient to define two matrices 


N (Lo p*: ph h (18) 
and Oi 0909 :200 
PO. Ole ee 

U= (Or 1- ð omo": (19) 
0. O0 dE 0 


the auxiliary identity matrix, and the following scalar quantities: 
Jo = RDR, F, = R'D'DR, 
F,-R'D'UDR, F. = R'DD'R, 
F,- R'DUD'R, V = R'DDR, 
F,=R'DUDR, F,- R'DU'DR. (20 
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The variances and covariances of A and B are then 
var A Hoe lp F, + pP +P) Fy + 2kip*P, + 20l(k lp) F, + 2kp(k +p) F}, (21) 
var B = fPe*(E-1p F, + (E P) F, + 2HF, + 2k(E + lp) N 2k + Ip) FQ), (22) 


cov A, B = f*a* (lp) N=. 2HpF, + (E lp) (pk +l) F, 
+(k+lp) (EF, pF). (23) 


Also &*(B) = (E-- 1p) Fy. (24) 
Substitution of (21) to (24) in (13) leads to the following simple expression 
varr = al A 3h) (25) 


Thus the asymptotic variance of r is not affected by the values chosen for k and J. 
It should be noted that the asymptotic variance of the linear estimate 


„DR (26) 


is also given by equation (25). This result will be used in the next section to derive quadratic 
estimates with minimum variance for a particular value of p. 

The bias in r, given by the second and third terms of (14), is rather more complicated and 
depends on k and J, The two bias terms are 


gi Pee AU ipetra DU (27) 
p (k+lp) Fy 
c? ((k+ pl) (pF, — Fy) + (pk — 0) F, — KE, lp F, 98 
E^ zl "(E+ pl) Fi | (28) 
When D is symmetrical, F; = F, and F = F, = F, so that (28) simplifies to 
o (onae ia = xn (9) 
pP (k+lp) Fe 


QUADRATIC ESTIMATES WITH MINIMUM VARIANCE WHEN f = fy 


The construction of a matrix D such that the quadratic estimate of p has minimum asymp- 
totic variance when p takes some particular value, p, say, will now be considered. I 

The required matrix can be obtained by minimizing the varianoe of the linear estimate 
(26) since, as previously noted, the asymptotic variance of this estimate is the same as that 
of a quadratic estimate using the same matrix D. 


The asymptotie variance of (26) when p — p, can be written 
n-1 n-2 
MIEL PL 


Ep = 2 
p (Zu) 
1 
where the w, are proportional to the elements of DR. Two restrictions need to be placed on 


(30) 


Tm. First, Tu, must equal zero; secondly, the absolute magnitude of the w, must be 
I 


n-1 yr 
fixed. It is convenient to write Y w, pi = A, for the second restriction. 
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The minimization of (30), subject to the two restrictions, leads to the n—1 e¢ 


(1 +p%) Ww. — po... + Wz) = Àa pa + As, y 


where x = 1,2, ...,n— l, wy = 0, w, = O and Ag, As are constant for all x. In matrix me 


Vw = ARAM, 
where 17  —po 0 0 
pe EFP Poun O . 
V=| 0 -Po % h 


0 0 -Po 1 ＋ 


R. = (1 po p oe P$ 
w is a column vector of the w, and 1 is a column vector with all elements equal to 1. E 
w = À VR, T As VII. 
ael 
Since 1'w = 0, 1.2. zur LA, = MEV?R, 
N 
and since A, can be chosen to be 1 
Msi A A i 
— —1— 
v (v ivi n, 


Thus the required matrix D which minimizes the variance of the quadratic es timate wl 

p = py is given b 
RU D i VT C 
m Aai ET a 


The elements d;; of D are obtained from the elements ci of V- as follows 


The c; = ei are given by 


c, = Po — p) (1 — ps") 
«7^ (i= 98") (= 08) 


o in-i) Ge). 
n 


when p, « 1, and 


when p, — 1. , 

The quadratic estimates with D as determined above can conveniently be denote 
T(Po k,l). The estimate (12) is obtained by putting p, = 0 in V. The r(0, b, ) are t 
fully efficient when p = 0; in the case n = 4 they also tend to be fully efficient 
approaches 1 but this is not true for n > 4. The estimate considered by Hartley e BU 
asymptotically efficient when p = 1. It is, in fact, (l, 1, I) in the present notation. ' 


* 
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The estimates "(Py k, I) can be regarded as the quadratic counterparts of the estimates 
considered by Patterson (1956). They can be expected to cover a wider range of p around 
Po with high efficiency since Diky, + ly,) makes some allowance for the difference between 
p and pe- On the other hand, the r l. 1) are generally more difficult to caleulate and may 
be subject to considerably greater bias, The linear estimates are subject to a bias term 
of the order of magnitude of (29), but the quadratic estimates are subject to an additional 
term (27) which can be very large. 

One method of calculation when p, « 1 is to estimate p (and, if required, a and //) in the 
following regression equation on pg and Y, 

Ye - -D (39) 
where x = 0,1,...,n—1 „= 21 7 p) (k + lpg) 


Y= (i-o) (lp) * 
Ya -Po Ya ™ ky, a ly, 


and Y, is taken to be zero. When p, = 1, the following regression equation on z and Y, 
can be used: 

a(1-p)(E-l) | (p—1) 
Yz = ( + (+p) Trip 


It will be noted that when p, = 0 and Y, is taken to be 0 equation (39) degenerates to equa- 
tion (11) together with the initial equation 


(40) 


yo 7 a- f. 


EXPECTATION AND VARIANCE OF r(0, k, I) 
The family i been considered in some detail for n = 4, 5, 6 and 7. 
nd main D EB x sha (9) and (10) is particularly simple. Its trace, 
required in (27), is n—2 and trace DU is —(n—2)/(n—1). In addition F, = F,. 
Expressions for F, and F, are as follows: 
Fy: n=4: $ü-py(lcp-cp), 
n=5: 4(1—p)?(3+4p+6p*+4p* +3"), 
n=6: #(1—p)?(2+3p+5p%+ 505 + 5pt+ 3p* + 2p"), 
n= 7. 11 — 9) (5 + 8p + 149? + 169? + 190 + 169* + 14p* + 8p? + 5p"). 
Fy: n=4: 3(1—p)?(—1+2p—p*), 
n= 5: yy(1—p)*(—1+8p+ 6p*+ 8p — 9), 
n= 6: (I-) (- 1 10% + 20p*+ 30p* + 20p* + 16p*— p*), 
n=: (1 5 (— 1+ 26p + 3862+ 64p? + Blp* + 640" + 38" + 26p?— p^). 


Although not essential to the determination of n d Ak icc em, 85 
useful for studying the limiting case as p—>1 when (25) and (29) Seemne 
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EFFICIENCIES OF THE ESTIMATES r(0, k, l) 


The efficiencies of r(0, K, ) when n = 4, 5, 6 and 7 are set out in Table 1. These are given by 
the ratios of the asymptotic variances of the least squares estimates obtained by the method 
of Stevens (1951) to the variances given by (25). For comparison the efficiencies of the 
following estimates 


4Y3 + Va — 991 

pa 8 L i 771 for n= 4, 41 
45. 1 — 5Yo a 
44 3Y3 — Ja — By, 5 

ELA Mei A for n= 5, 42 
4*9ͤꝛ + 3ys — Yi — 6% a 
4j; + 4y4- 2y4 — 3a — Ty. ; 

r= Ea for n= 6, (43) 
4y, 4% + 2½ — 3y1 — To 

es Vet Js 94 — 9a — <A for Vn = 7, (44) 
Yst+YatYs—Y1—2Yo { 


are also shown in Table 1. These are the estimates proposed by Patterson (1956). 


Table 1. Efficiencies of simple estimates of p 


r(0, k,l) | Eq. (41) 70, K, ) | Eq. (42) | (0, K, ) | Eq. (43) | r(0, k,l) 


100-0 89:3 100-0 77-4 100-0 65-2 100-0 
99-8 95-4 99-4 88-5 99-2 79-2 99-1 
99-4 98-6 98-0 95-7 97-2 90-2 96-7 
99-2 99-8 96-6 99-2 94-6 97-0 93-3 
99-2 99-9 95-5 99-9 92-2 99-6 89:7 


99-4 99-4 95-0 98-9 90-4 98:8 86:6 
99-6 98-5 94:8 97-0 89-4 96-1 845 
99-8 97-4 94-9 94-7 89-1 92-5 83:5 
99-9 96-3 95-1 92-3 89-1 88:8 83:2 
100-0 95:3 95-2 90-0 89-2 85-2 83:3 
100-0 94-2 95-2 87.8 89-3 81-9 83:3 


The results for n = 4 are in accord with those given by Finney (1958). r(0, K, I) has very 
high efficiency over the whole range of p, and gives a rather better overall performance than 
the estimate proposed by Patterson (1956). This superiority is not, however, maintained 
for n» 4; when n = 7 r(0, K, I) is substantially less efficient than the estimate (44) over the 
range of the most useful values of p. 


Bras IN 7(0, 1, 0) 


The total bias given by (27) and (29) can be expressed as a multiple of the asymptotic 


variance of r as follows: bias = vary 


Values of 0 for the estimates r(0, 1, 0) and the estimates (41) to (44) are set out for n = 
in Table 2. 


4to7 
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Table 2. Bias in r(0, 1,0) compared with Patterson's estimates 


| Values of 0: haas var r 
— c j 
p n=4 nzb * * nel 
r(0, 1, 0) | Eq. (41)| r(0, 1, 0) | Eq. (42) | r(0,1,0) | Eq. (43) , 1,0) | Eq. (44) 
| f M ae -— 
0-0 | —0333 | 6024 | —0583 | -o2 | —0700 | -0415 | -0-767 0.800 
4| — -409 122 — 821 | — a4 | —1008 | — 30 | -1:250 | — 4% 
2 | — 46 | 213 | —1042 | 2-088 | -1439 | — 346 | -1:756 — 357 
.3 — 488 | 30 | -TSi «92 | -1814 | - 2137 | -2291 — -253 
4 | — 495 | -350 | —1-408 1404 | -2185 | - 018 | -2860 | — -132 
05 | —0486 | 0411 | —1-535 0-256 | —2-530 6102 | -3461 0000 
6 | — -465 | -449 | TOS 335 | —2818 215 | —4035 | 132 
5 | — 435 | 415 | —1-642 398 | 23'019 314 | -4516 | 253 
8 — 402 -491 | —1-696 44 | -3112 395 | -4829 | 357 
9 — -367 | -499 | —1-575 479 | 3000 456 | —4-932 | 439 
10 | — -333 | -500 | —1-500 500 | —3-000 80 | -+833 800 


For n > 4 the bias in the simple regression estimate is considerably greater than for the 
Patterson estimates. Thus, for example, if each y has a standard error of + 0-1/ the bias in 
r(0, 1, 0) when p = 0-6and n = 7 is approximately — 0-06. A bias of this magnitude must be 
regarded as serious particularly if average values of p over several sets of data are required. 

It should be noted that as n increases the bias in r(0, 1, 0) becomes relatively more im- 
portant, as predicted by Yates (quoted by Finney (1958)). 


ESTIMATES r(0, k, I) WITH LOW BIAS 
From the above it will be seen that the value of r(0, 1, 0) as an estimate of p is severely 
limited. There are, however, estimates in the same family r(0, E, l) for which the biases are 
much smaller. 

As previously noted the magnitude of the bias depends on ꝶ and i. In fact if 

E ((n—3)(n—1)*(n-2)p (n— VPA 3 (45) 
17 {(n—2) + (n—1)(n—4) pp} Fi *- 2(n - 1) F, 
the bias is zero. Values of k/l giving zero bias are set out in Table 3. 

Thus for n = 4 the regression of y,,, on 2-75y, Ke can be expected to produce reason- 
ably unbiased estimates of p (given by 2-75b/(1—6), where b is the regression coefficient) 
for moderate values of p. The actual biases in this and other cases, again expressed as 
multiples of the asymptotic variances, are set out in Table 4. ! 

For moderate values of p such as are likely to arise in practice the biases are in each case 
substantially less than the biases in the estimates r(0, 1,0). Choice of a suitable value of 
k|l does, however, become more difficult as n increases. Thus, when » = T the bias in the 
estimate proposed by Patterson (1956) (see Table 2) is smaller than the bias in r(0, 1-5, 1) 
or, indeed, in any single (0, k, 1) for most values of p. 
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Table 3. Values of k/l for zero bias in r(0, k, l) | 


n=6 nz ] 
4-29 5-22 
2-89 3-28 
2-24 2-44 
1-88 1:99 
1:66 1-71 
1-52 1-52 
1-43 1-40 
1:38 1:32 
1:35 1-26 
1-33 1-23 
1:33 1-21 


Table 4. Bias in r(0, k,l) 


Values of 0 = bias / var r 


p 
n=4 nzb n=5 n=6 n=6 nz" 
r(0, 2-75, 1) | r(0, 1.7, 1) r(0, 2, 1) r(0, 1-7, 1) (0, 1.5, 1) (0, 1-5, 1) 

0˙0 0-030 0-593 0-417 1-065 1-300 

‘1 — -029 369 199 -704 -925 

2 — -059 -200 031 407 624 

3 — 064 076 — 096 -161 380 

4 — 052 — -010 — +184 — -044 +182 

0-5 — 0-029 — 0-061 — 0-238 — 0-208 0-025 

6 -000 — +084 — 261 — +328 — -091 — 

7 031 — -085 — 258 — -404 — 167 — 

8 061 — -070 — 237 — 438 — +205 — 

9 088 — 046 — -204 — 436 — 214 = 
1-0 111 — 019 1067 — 408 — 200 — 

DISCUSSION 


The method described above can be fairly readily extended to further n; results are in fact 
available up to n = 12 but in less detail. These results follow the pattern suggested by 
Tables 1, 2 and 4. As n increases the regression estimates r(0, k, ) become more inefficient 
in the useful range of p. Thus, the efficiency at p = 0-75 for n = 12 is only about 63 ‘he 
The bias in the simplest regression estimate, r(0, 1, 0), becomes more serious and the range 
of p which can be covered with low bias by a suitable choice of V/ narrows. 1 

It should be noted that p increases with n if the range of x is kept constant. Thus, if 
increased numbers of observations are obtained by reducing the interval in z by a factor f f 
the value of p is increased to the power FI. Consequently, the property of the r(0, * 
of high efficiency when p is small is of little value for increased n. 
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Since, therefore, large values of p (in the range 0<p <1) become more important as n 
increases it is worthwhile considering the limiting case as p tends to 1. The efficiency of 


r(0, K, ) at p = 1 is 10(n—1) 
7 a++) 
and tends to zero as n increases. The bias in r(0, 1, 0) is large relative even to varr(0, 1, 0) 
the ratio being Tae 
0 — -( | ). (47) 


(46) 


6 
Thus for very large n the bias tends to —n*/60 times the variance of the least squares 


estimate. 

Similar results can be obtained for values of p < 1 but the formulae are more complicated*. 
As a check, the case n = 21, p = 0-8 will be considered. This value of p corresponds to p 
equal to about 0-4 when n = 6 for a fixed range of x. In this case 

var r(0, k,l) = 0-26802/5*, 
bias r(0, 1,0) = — 38-62 var r(0, 1, 0), 
bias r(0, 1, 1) = 1-18 varr(0, 1, 1). 


It is convenient to compare these results with those for the estimate 


n-i 
bh LS P 
1221 — (48) 
X LU ST 
1 
where W, = z(n—z)(2z—n). (49) 


This estimate is fully efficient when p = 1 but not when p = 0:8. The variance and bias are 
varr = 0-1219g?/f?, biasr = — l-68varr. 
In this case, therefore, the efficiency of r(0, k, I) is certainly less than 


The bias in 7(0, 1, 0) is very large but can be reduced considerably by choosing k/l according 
to equati n and p — 1. 
M le Mec that in general the regression equations (4) and (11) 
are unsuitable for estimating the exponent of an exponential regression equation. 
Similar methods can be adopted for estimates 700, k, l) with po other than 0. Investiga- 
tion of these estimates is not yet complete, but it is clear that theestimate r(1, 1, 1) proposed 
by Hartley (1948) and the associated estimates r(1, k, 1) are in general much better estimates 
than 0, I, J, although they are, of course, more difficult to calculate. e 
It is also of interest to investigate the properties of the estimate of p given by the met 
of Stevens (1951) (and linear estimates generally) when the errors in the jy, are Sem: 
lated. The algebra given above can be modified for this purpose. Preliminary res gges 
the variance of 7(0, 1, 0) is inversely proportional to n, 


* In general, for very large n and p > 0, noo ot 
the efficiency is inversely proportional to n and the bias is independent of n. 
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that, whilst r(0, 1, 0) is fully efficient when the errors follow a first-order Markoff process, as 
pointed out by Finney (1958), the Stevens estimate of p is subject to only small biases and 
moderate losses of efficiency. In practice there are likely to be random errors in the y, 
whether or not there are also autocorrelated error components. My own view is that the 
method of Stevens can be safely recommended for most biological applications, 


SUMMARY 


The behaviour of the simple regression of % on y, as an estimate of p in the equation 


y. = a — pp”, 


where 0<p< 1, x = 0, 1,2,...,n—1 and the y, are subject to independent errors with 
equal variance has been investigated. 

As pointed out by Finney (1958) the estimate is of high efficiency and is subject to a rela- 
tively small bias when n = 4. As n increases, however, the efficiency decreases markedly 
over the useful range of p and the estimate is subject to an increasingly large negative bias. 
These properties make the estimate unsuitable for general use. 

Alternative estimates given by the regression of y,,, on ky, +ly,,, have also been con- 
sidered. These estimates have the same asymptotic efficiency as the -imple regression, but 
by a suitable choice of k/I the bias can be considerably reduced. The + are, however, of only 
limited value since in practice choice of £// is difficult unless n is small. 


I am grateful to Dr D. J. Finney for letting me see his paper in draft form before 
publication. 
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f can be used. Various forms of equation (40) with Y, not necessarily zero have been 
considered by R. F. White in an unpublished thesis (Iowa State College, 1956). 
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TWO QUEUES IN PARALLEL 


By FRANK A. HAIGHT 
Institute of Transportation and Traffic Engineering, University of California 


SUMMARY. The method of differential -difference equations is used to investigate the case in which 
each arrival to a system of two queues joins the shorter queue, or, if they are of equal length, one 
particular queue. In case each person must remain in the queue which he originally joins, relations 
are obtained between the asymptotic state probabilities. If queuers are permitted to change queues 
whenever it seems advantageous to do so, the formulation is simplified, and explicit expressions are 


obtained. 
1. INTRODUCTION 
Several writers have analysed a queueing system in which service is provided by several 
facilities operating jointly, in such a way that, to obtain service, each queuer need deal with 
only one of the service facilities. Simple examples of this system can be seen in the multiple 
queues of banks, government offices, etc. 

It has been the practice, however, to assume that arrivals to the system are assigned to the 
queues in rotation, and that they must remain in the queue to which they were assigned. 
Our experience clearly indicates that this assumption is unrealistic; people prefer to join the 
shortest queue, if there is one. Furthermore if, while they wait, some other queue becomes 
shorter, they will change queues. In this paper we limit the number of queues to two; and 
study separately the cases in which queuers do/do not change queues. 

Arrivals to the system will be assumed to be in the form of a homogeneous Poisson process 
with parameter A, and service will also be Poisson, with parameters y; and x’ for the two 
queues. The length at time f of the queues will be denoted by X(t) and X"'(t). An arrival will 
join the queue characterized by X(t) if and only if, at his time of arrival 

X(t) « X'(t). 
If, on the other hand, an arrival finds 

X()» X'(t) 
he will join the queue characterized by X’(t). Thus the queue X(t) has a certain xem on 
in absorbing arrivals in the equally advantageous case, and so will be called the ‘near 
queue. Similarly, we will call X’(t) the ‘far’ queue. 

Los perlt) = Pr(X(t) = x, X ( = y). 

Pz. (0) = Pr{X(t) = 2}, 

Py (Ù) = Pr(X't) = y). 
In the calculations which follow, we will be concerned mainly with statistical equilibrium 
(t = oo), and indicate this by suppression of t 

Pry = Pal) Pr. = p.(o) P. = p., (9). 
Moments of these distributions will be denoted by the following system: 
Ex (= Depp, =m, HX (= Tub. = "^ 
Zz 


var[X(o)] =v=0%, var X( )] = v = 0", 
cov [X (oo), X'(20)] = roo". 
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We will also need to consider the means and variances of the array (conditional) di 
tions, and will use a system of single subscripts for this purpose 


m, = E[X(c)| X'(o0) = y], m; = E[X'(oo) | X(o0) = a], 


v, = var[X(00) | X'(oo) = y], v; = var [X'(%) | X(20) = z). 
That is 
Eus = mp. E Pry = mp. . 
T 


DVD, = (ur m.) p. Lr r = (ng) p. „ 
v z 


Also we will use q and q' to denote the ‘ traffic intensity of the near and far queue, respectively 
q=Np, v= Fp’, 


where py = p'u' = A and N and F are respectively the relative frequencies of joining the 
near and far queue. Then, with the convenient abbreviations p = po, and p' = p_ we have 
p+q=p'+q = 1. This is a well-known relationship between traffic intensity and prob- 
ability of an empty queue, applied in this case to the marginal total distributions. 


2. FUNDAMENTAL EQUATIONS: CASE I (CHANGING QUEUES NOT PERMITTED) 


The analysis in this section will consist essentially of the following steps: 
(a) By considering the expression p,,(t+At), we will obtain an expression for 
Op,,(t)/et, x, y = 0, 1, 2,.... 
(b) Assuming equilibrium as t-> 00, we find equations involving Pry, *, y = 0,1, 2, ++ by 
setting pt) / ot = 0, 
(c) The x, yth equation of the set is multiplied by Ss“, and the whole set summed over 
allwand y, giving a generating function equation involving the bivariate generating function 


665, 8’) = Y X Pay 98". 


(d) This equation, and its first and second partial derivatives are evaluated at 
(i)s = 8’ 2 0, (ii) s = 1, s’ = 0, (iii) s = 0, s“ = 1, and (iv) s = s' = 1. 

In the equations corresponding to each of these steps, we will encounter four important 
terms, arising respectively from the following four possibilities: 

(A) No change in the queue lengths during the interval At. 

(B) A service termination in the far queue during the interval At. 

(C) A service termination in the near queue during the interval At. 

(D) An entry into the system during At. (This may be to either queue.) 

In addition to these, there is a trivial term of o(At) representing the probability of more 
than one transition during At; for convenience such terms will be omitted. 

It will also be useful to deal with each of the terms (A)—(D) separately, working through 
the steps (a)-(d) with one term at a time. This permits more compact formulae to be used; 
also it is easier to think about one type of situation at a time. Finally, the equations (a) 
will be assembled from the four pieces obtained. The parts of the generating function equ 
tion arising from (A)-(D) will be called A—D, so that the equation itself may be written 


A(s, 8’) + B(s, 8’) + C(s, 8’) + D(s, 8’) = 0. 
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(A) No transition 
The probability of no transition during Af is 
(1 7 AM) (1 — pt) (1— 8M) (1) 
and in forming p,,(t+ Af) we multiply by the factor p, (f), since the system is in the same 
state at time f. In forming the derivatives with respect to f, we must subtract Pad!) from 
both sides, divide by At and omit terms of o(M). We then have 
- (A i n) put). (2) 
When z = 0, the middle factor of (1) must be deleted; when y = 0, the last factor; and 
when z = y = 0, both factors. Then (2) is correspondingly simpler by the omission of p or x’ 
or both. Multiplying the x, yth member of this set by v and summing over all z and y, the 
term of (c) corresponding to no-transition is 


* = 
A(s,8') = — (A+ p+ p) ols) +u E Pot” +N E Paot". (3) 
The following tabulation shows A(s, 4’) and its first and second partial derivatives evalu- 
ated for zero and unit values of « and : 
s=a’=0 aman 
A Aue Ar 
4. — (A A) Pio AN. 
Ay GAD Gre P+ AP 
Ay UÀ +A) Pro —(A+p) p'(v, + m] — my) 
Are —2A+H') Por UA aA) pa npa 
Ay (A++ R’) Pn — a ER)p am 
aez0,7 = i * 1 21 
A (À +A) p Poo -O REM) ER p Rp 
A, —(A-- a a^) Py. Pie -O kp Rm pé p'm, 
A, — (A u^) pms, (Atata) m E pmé 
A, AAT) .. + 28 pss — (ck act j^) (e m! — m) - ip(e, +m — QN) 
App — (A n) p(vo- mg! — mo) — (A pi n) (o - m" — m^) + pies + me! — mi) 
Ay — (A p u^) Pym — (A p i!) (mor 4 mm?) 
(B) Departure from far queue 
The probability of a departure from the far queue during At is 
(1—2AA1) (1—pAt) iM (4) 


and, in forming p,,(¢+ At) we multiply by the factor Pa,ysrlt)s since, to be in the state z, y at 
time t+ At with a departure from the far queue in At, it must have been in state x,y +1 at 
time t, When the expression is multiplied out, divided by Af, and terms of o( At) suppressed, 


TAE pull). (5) 
The only special case of (5) arises from an empty far queue; for y = 0, (5) is also zero. 
Multiplying by ss and summing, we obtain l 

B(s, s") = (u'|s') (5,8) — Qt 8") gol) p^» (6) 
where g,(8) is the generating function of the zero-th conditional distribution 


Pad) = È Pa. (7) 
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For BGS, &) and its partial derivatives we have the following values: 


a=e=0 ssm1,5 =0 e=0,4 — 1 8-8 21] 
B . Poi KPa K'(P— Poo) BM -p») 
B, WPu 1 b. 1 A. — Pio) 1 mp) 
B., u Pos EPs Ee’ (pmg — p + Poo) A —14-p) 
B., 2½ pri J P. ile & inf mi) 2w (pa, — Pao) Ae m — m — p'v, — p'ma + pm) 
Bee 2¼ Pos 2% P.3 A (pos + pmo? 3pm + 2p — 2Poo) A m 3m 2 — 2p) 
Bw MPi A^ p, am ,p. PI. + P10) n'(oa*r -mm' —m t my) 


(C) Departure from near queue 
This case is perfectly symmetrical with the preceding one, and the relevant formulae ean 
be written down at once. The term required for (b) is 


Ju Pei. y(t). (8) 

The term required for (c) is 
C(s, 8") = (us) Ss, &) — (#8) C) D. (9) 
where p“) = X Por V. (40 


The following table is obtained by exchanging s and s“, y and y’, x and y: 


s=s8'=0 s=1,8=0 I s=s'=1 
CHP» Hp’ — Poo) Hp. M —p) 
C, Pax» ppm — p' + Poo) Ia. p(m —124- p) 
Cy py B(p.3— Pol) up, mi pm! — mop) 
Cs 2% Ae 3pm, 2p’ — 29) 3ups, M(vm?—3m-2—2p) 
Cow 2b 2U(P.2— pos) yp (oi K — mi) plv’ +m’? — m — poi — pmo + pmi) 
Cw MPa Amp. Pa 2a) Hp. ma poa h m +m) 


(D) Entrance into the system 
So far the development has been fairly straightforward. The characteristic difficulty of the 
problem occurs in this section, for an arrival to the system when X(t) < X'(/) will produce 
a different transition from the one produced if X(t) > X’(t). 
The probability of an arrival is 


(1—At) (1 —p'At) AA. (11) 


If the system is to be in state x, y at time t+ At, then there are three possibilities for time t: 

(i) Ifa<y, then, at time t, the system must have been in state «—1, y, since the arrival 
must have joined the near queue. 

(ii) If x>y+ 1, then, at time ., the system must have been in state x,y 
arrival must have joined the far queue. 

(ii) In the intermediate cases z— 1 = y and x = y, we cannot say whether the state at 
time t was = 1, y or x, y — 1, for either is possible. 

Therefore, the factor which must be multiplied by (11) is of the form 


— 1, since the 


I(x, Y) pz y +I (9) Pz, y- (12) 1 
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where the values of J and J are given by the following table: 
y70 y=l yx? yed yous 


We now show the situation at time ¢ + At, giving in each cell the states from which that value 
could have arisen, with an asterisk to denote an impossible situation. For convenience in 
computing the generating-function equation, the appropriate powers of s and s’ are given in 


place of the values of z and y. 


af? at a” a’? st 
s? * * * * LI 
st 0,0 1,0 0,2 0,3 0,4 
0,1 
82 * 2, 0 2, 1 1. 3 1.4 
13 1,2 
s = 3,0 3,1 3,2 2,4 
2,2 2,3 
* 4. 0 4. 1 4. 2 4. 3 
3. 3 3. 4 
* 5. 0 5. 1 5,3 5,3 
4,4 
Defining s, 8") = D Dry 
z<y 


and adding the terms in bold type in the above table separately from those not in bold type, 


we find * 2 24 
D(s,s') = AEn inen Tue De + 7 
0 


0 1 2 
+ a(s Eps” - es x Pays + Ss. Y Pays” + «) k 
0 


D(s, S) = Asi - As'(ó — V). 


The evaluation of D and its derivatives follows: 


FIG I sls E 
D, a x 50 0 
Dr ^ ag piu bo "mo Bi Pu Pu +Pa) 
D. apurra er Apio + pmi) 


D.(1, 1), Dl, 1) and D,, (1, I) will be evaluated later. 


26 


(13) 


4-3 
À 
A(N +m) 
AF +m’) 


Biom. 45 
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Adding (3), (6), (9) and (13), we find the generating function equation of the system 
11— 11—5s' 

a- $+ a) =F prf) -ON (14) 


This equation could be differentiated to obtain the equations which follow, but it is easier 
(and there is less chance of error due to division by zero) to combine the values given in the 
four tables. Taking the rows from left to right in order gives 


A NT Doi + MPio = O, (15) 

—Ap' . Àpo = 0, (16) 

bi. = 0, (17) 

0:20, d 

— pio — Pao + I Pii + P29 + APoo = 0; (18) 

—Ap'mg + up, 11 — pt poo APoo = 9, (19) 

ANI. UI. pa, Àp t Àpio = O, (20) 

AN = u(1 — p), (21) 

N01 — K Poi +H Pos - Hii = O, (22) 

— Ap a — up a-- p.a Ap! — Apos AP + AP =0, (28) 

mb .. h + i po * HP1.™ = O, (24) 

AF = u-. (25) 

— Mpso — HP 20+ I Day + Hao = O, (26) 

—Ap'wy + up, ,wy — 2p' um, 2p' p — 2% = 9; e 
(where w = v -- m? m, etc.) 

Az. /i. + HPs, + ADI. Apio-- Apso AP = 0, (28) 

D,,(1, 1) = Aw + 2um — 2j + 2up, (29) 


— Apos — I Doa E/! Po * tio = O, 
— p.a — Ip a WD. a- A39 — Mii Ait Aia AP 22 = O, 
— Apu, 2% ping + 2½ P — 2½ poo + 4P1.W1 = 9 
D,, 1) = Aw’ + 2m'u' — 2p! + 2u^p', 
— (FA B!) Pu BI Pio HP AC + Pro) = 9; 
— (A p) p m, * n i t Poi t A(Por+ 2211 +p'mo) = 0, 

— (À+ 4) py mi e jpg m$ — M pr. I io A(pyo + pmo) = 0, en 
D,,(1, 1) = Acor m + um + jum! — p/mg — Hmo. 


3. INDIVIDUAL QUEUE LENGTH DISTRIBUTIONS 
We will use capital letters to denote compound probabilities in the conditional distributi 


* 
p. Tai = T, Va Q,, = 1- Ey; 


LÀ 
Ds Ey "| p Qiy * 1— P$. 
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Using equations (17), (20), (28), and others which can be obtained by further differentia- 
tion with respect to s, we obtain 


i= = } (38) 
P. POL 2-2P et: 

For the far queue, we begin with (16), (23), (31), ..., and find 
Py - P'Q,- i- (39) 


Equations (38) and (39) generalize a formula (Haight, 1957, p. 362 (2)) which applies to the 
special case in which the far queue (there called the balking distribution’) is independent 
of t. 

Some other interesting relationships can be found by adding up directly various members 
of the set of difference equations (2) + (5) +(8)+ (12) = 0. For example, adding equations 
for which either z or y has a fixed value N, we find 


pss = MPs P NaN PP (N = 0, 1, 2. -H. (40) 
Summing over all N, 
AZ Poy =D Pry th! EP (41) 
remy 4 zcy 
On the other hand, choosing equations for which x+y = N, and adding, we obtain 
A X Pay = CE B") py sca Pray) Av UID (N E 0, 1, 2. ), (42) 
+y=N 


* 
where p is set equal to zero when it has a negative subscript. r 
Letting n denote the probability that the whole system contains z members, we can 
write, from (42) 


1 
(75.1 Pox) + p (l. o). (43) 


N = 


1 — 


4. VARIOUS EQUATIONS INVOLVING THE MOMENTS 
z 
2 21 En. Q. - 1E; 
y + LI 
Fr = Nu, Q,-1-Py 
0 
denote the compound probabilities in the marginal distributions. From the set of equations 


beginning with (19), (35), ..., the following recurrence formula in the conditional means can 
be found: 


Lj 
* ‘a 44 
[2L MIS Y PET = Ame p. *uP; A P A z Zi, (44) 
where Zo = Pow 
Z, = Pa t2pw 


Za = Doa 212 + 3022 — Pi 
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Similarly, beginning with equations (24), (36), ..., 
mA. = Amn py, E KP Pa - AE Zi, (45) 

where Z = 0, 

Zi = Piw 

Zs = Poot Pas 

Z3 = Pao + Par + 302 — Por 

Zi = Pao + ?pu + 304% + 4pa3 — P31 — P32 
It will be noted that | 

52. N. XZ-F. 


Thus the limiting form of (44) is (21), and of (45) is (25). 
A relationship between m and m’ can be found from (29), (33) and (37) 


D,4(1,1)-D,(1,1) =A X ( 9) pz, + 2 Pry 
4 zy 


D. (I, I) = Aw--2A Y, zp,, (46) 

and, comparing with (29) ud 
PR = (1/p) (m — q). (47) 
Similarly Z Pa = (1/p')(m'—9'). (48) 


Now 
D,(L, I) T DI, 1) D,, 1) "Ox Wey tÀ X Puy =A PL 2 + 2% + 1) Pr 
and therefore 
D(,1) = —A(N +m) - AC? +m’) — (Alp) (m — q) —(A/p’) (m'—4') 
+A X (wy + 20+ 2y +1) Pry: 
Comparing with (37), we find me 
(A=p—p!)(m +m!) + (A+ jm) + i'm) = 9. (49) 
It is also interesting to note that since 
Do, 1) = Ezyp,,* X max (, y) Prys 


E max (x, y) Pry = 2 a in vm $ (50) 


5. CHANGING QUEUES PERMITTED 

We will now suppose that some individual (say the last) from a queue will go over to the 
other queue whenever it appears advantageous to do so, that is, whenever his place in the 
queue will be improved. This means that the queues can never (except for the instant requ! 

to change queues, which we will ignore) differ in length by more than one, so that the possible 
states of thesystem must be of the form x-- 1, xor x, or x, æ — 1. Equivalently, we might say 
that each discharge from a queue is to be considered as taking place from the longer queue, 
unless they are equally long. 
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The difference equations in equilibrium are not difficult to find, keeping in mind this 
principle, and consist of the following: 
State Equation 
0,0 — ÀPoo A 01 py = 9, (51) 
zr fo z»0  —(Au npe (n )Ds sa Pria) tA(Ps2-1 Pata) = p 


1,0 A Piot I Pir Apo, = 0, (53) 
11 I, & for r»0 — Eat I )nnPeaat = , (54) 
0,1 — (A )pay t py = 9, (55) 
z,r--l for z»0 —- (rptu )ppuractHPrazam 9. (56) 
We will use the following abbreviations: 
Po = n, (At+p+p') a, ar =, 


; , (+p')(a+e'? _ 
ATO het eda n 


LA Kf?) = A’, L(K+ad+ P Ap) = A, 80 —0 - B. 


Solving (51), (53), and (55), we obtain 


: ap pp) 
Pa im. Pit I Duos: 3 


Solving (52), (54), and (56), we obtain 


a 25 58 
Pre = —BPz-1,2-1+ tape Gt? (Pu- + D- 2-1)» (58) 
ag Au ey 59 
Praa" E otk 55 a TES n2 (Pz-1,2-2 t 222,21) (59) 
ay’ / (60) 


Ai ; 
Pe, õ-1 EC 355 METI Pr-I. 2-1 [TES (Pr-1,2-2 +Pr-2,2-1) 


Substituting (57) into these three equations, we can find pis, Par and Pa. Using the equations 
repeatedly, we can find each Pyy in terms of 7 


Pre = CALS (x20) (61) 
Pr-1,2 = AuLf"m (x>1), (62) 
| (63) 


Dr, 1 = KL pn ( 1), 
and these may be verified by substitution into (52), (54), and (56). eae der hs 
(and assuming £< 1 to insure convergence; this condition es pierced ee 


find an explicit formula for 7 
(1—f) (#+A) } (64) 
7 7 (xut p- PP) * Ae. p) 
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The individual queue-length distributions, and the various moments may now be com 
easily. Some of these values are given in the following equations (dots used in p 
sections are omitted): 


* AN- 
n 
a Aup' 
P= a Wl 
Pa = KLff*n + &ÀALff*n + AuL fog, 
„ & CAT 
e 
+ Anm uL . 
pi = He b up 
p, AuLff*n + aALffm + KLB, 
T AR 
Wi EET nid 
,_ AP 
m = appt?" 
_ B+?) An 
pud er KE 
, PO P) A'n 
Lp = aan 655 
ALn frat 


ure = = Bj ue 
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MOMENT ESTIMATORS AND MAXIMUM LIKELIHOOD 


By L. R. SHENTON 
College of Science and Technology, Manchester 


1. Let P(x; 0,6, ...,0,) be the probability of the variate x, depending on the A para- 
meters % = 1,2, ..., À). It is assumed that P(x; 0) possesses first derivatives with respect 
to the %% and that the moments , and their first derivatives exist and are finite, Let 


(q, (x); be the orthogonal system of polynomials associated with P(x; 0), and write 
q(x) = s (a, O, r=0,1,...), (1) 
where foie Pee: nde = a. 
Ju Petar = 0 (rs) (2) 


To avoid undue complication at this stage we assume P(z; ô) is continuous throughout its 
range. We reconsider the restrictions on P in a subsequent section. 
Now let the formal ‘Fourier’ expansions of the derivatives of P be 


PE) _ pt; O) A qr) Anil) e) (G= L2 sA) (3) 
7 
For the partial sum of the series in (3) we write 
(v) = SA wale), (4a) 
P (2; 0 
where Andy = f TOE 5 ) ds 
(0 [dK Q) py. (4b) 
x ks P(e; 0) dx 


if the range is independent of 0 (which will be assumed throughout the subsequent 3 
ment). We consider v) as an economized (or Tchebycheffian) polynomial € es 
to dlog P/00,, for a given value of r. This being the case, we may estimate the parameters 0; 


from the system of equations* 

AP = Sup = (j= 1/21 5A): (5) 
Where the summation is over a random sample of N. If@ log Feb, is Cd uem 
for a determined r, the corresponding equation in (5) vill be the associat n ci nn Ca 
tion. Again under certain conditions (5), when r—> 00, will become equivalen: 


hood equations. l 
Our pa objective here is to determine cov (075 6,,) for large 9 je > 95 5 
Ox, 9 , Oy) of (5). Since £g, = 0, and from (4b), Ajo = 0, ew fed si d 
are consistent (a modified definition of a consisten! estimator or statistics y 
given by Fisher (1956, p. 144). 
* See expression (13) of my 1950 paper (to be referred to as M.L.). 
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2-1. Now consider small variations in 6, which from (5) satisfy 


h ad?) 
00, E — 3-8, AP = 0, 
a eE a j (6) 
where the second member refers to fluctuations in the sample moments and 
r k P 
0, AS) — An Cart amt, (7) 


where m; = St/N. But using (45) 
. T 
« T|-exa êq le) 


dpr ies n db per 


— PET Ara Pa 
= — (J, k) = — (bj)o- (8) 


((3. k)o) (80) = {bn AL}, 
where (u) is a column vector and in particular (00) = {60,, oh, ...,00,). It now follows that 


Wé'how write (6) in matrix form 


(86) {88} = (IIND Bn LP} {8 SLO} (li jon)? (94) 
where (u] is the transposed of {u}. Taking expected values we have 
lo (90) = ((j, E -N, (90) 
provided it can be proved that * 
Fn LP) Sn AP) = (js Ho x. (10) 
r k 
First of all Eby AP 8m) = E Y, A, X aj 9m, Om, 
k-0 No 
* 2 k 
- = E Ay È anas AN, 
CY. k=0 ^'X-0 


after using the well-known expression for cov (m4, m) and u, = Emi. Hence 
T 
6 (0, AP ams) = N- X An Gre) ( — ul 


r 
NI Ar r), since A; = 0. 
k=0 
But Sg / is a linear function of the sample moments, so 


r k 
En AP Sy AP) N | X Angla) È Aur) 
from which (10) follows. 2 
2:2. Returning to (95) we find expressions for the asymptotic covariances as follows: 
N var Gi. = A/A, 

N cov (Hs, Ci,) = ARJA, 
and in general N cov (% Êr) = AQ/A, a 
where Af is the cofactor of (j, k), in the matrix (( J Hen) whose determinant is A”. 
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It is now of some interest to notice that if roo in (11), and we assume that the various 
expansions involved remain valid, then (11) gives the usual result for the covariance of 
maximum likelihood estimators in terms of weighted cofactors of 


Glog P ?log ) 
Pde}. 
* (UL o0, oO, 


It may be noted that there is a parallel line of thought in Geary's (1942) proof, that the 
generalized variance (asymptotically) is a minimum for maximum likelihood estimators of 
the parameters of a population which can be described in terms of frequency compartments. 
Again it is instructive to compare the basic ideas with those which arise in curvilinear 
regression when the independent variable is discrete, In the latter case the residual variance 
decreases (in general) with each parameter introduced and finally (after using a finite 
number of parameters) becomes zero. Reference may be made to Aitken (1935, 1945). On 
the other hand, with a moment estimator, the variance decreases and reaches a limiting 
value only after including an infinite number of terms in the likelihood equations. 

2:3. To put the matter in relief, suppose P depends on one parameter H only, then with 


4,9, = - [Pes 0) tl?) a, (12a) 
6, is a solution of 8 E AN =0 (128) 


with large sample variance given by 
(Nvarĝ j= Aip Ai +... A70, (126) 
and, assuming the validity of the expansion, asymptotic efficiency given by 


A19, + A39. +... + A20, 


MES : i» (12d 
EI. e FO xd 020 


It is clear from (12c) that we also have (for samples of N) 


var, > var, » var), > .... (12e) 


3. The property indicated in (12e) for a single parameter can be generalized. For it is 
clear from the expansion for the generalized variance determinant given in (16) of M.L. that 
A? is non-decreasing (considered as a function of r). One would expect the same sort of 


property to hold for var 2 For simplicity consider 
(V varÓ,,)-1 = A+ AP. 


Now it may be proved that f 

l ; (J,. 1) (1, 2) 0 re (1, 4), 3 

(219 (252) — (29e rh 

ANACHD — ACHDAN) = 6 . . B . 

55 VVV 
Airy Azry N Arry 

This result is achieved by writing 


(j; ) (j E) + Aj rkr rv 


414 Moment estimators and maximum likelihood 
and then introducing the bordered forms of the determinants involving these terms, 


appealing to a pivotal condensation identity (see, for example, Aitken (1946), p. 49). Hence 


varÜ, 2 var % var 2... (j= 1,2,..., A), 


and this holds for large samples of N. 


(14) 


4. Aselementary examples of moment estimators we mention that for a Poisson, Binomial 
(assuming the index is known) and Normal distribution the form of 2log P,/00 is a poly- 
nomial, so that .o/7? = merely gives the usual maximum likelihood estimators. As further 


illustrations we mention: (a) the Poisson distribution 
E = em, - e- (x = 1, 2, . . .), 
=0, otherwise 


" e z 1 
YER a cc Cm 
so that e,. 

41 * O, 
Y log P, m;—m 
and S ob tm A (7). 


which leads to the likelihood equation for , 
(b) the distribution with probability density 
E = ef -d (a- ) (OS , 1<a<2), 
with qo(z) = 1, 90 = 1, 406 = 0; 
q(x) -, ¢,= —a®+4a-2, 411 = l; 
qalx) = (—a? + 4a — 2) x? + (4a? — 20a + 12) a+ 2a? 4- 4a — 4, 
$a = 4(—a? + 4a — 2) ( 4a? + 15 12a +2), 
42 = 4(1—a), and so on. 
For the ‘linear’ moment estimator a,, 
af Aida) = 0 
and (N vara)? = A19, = (—-a*-- 4a — 2). 
For the ‘quadratic’ moment estimator (a, 
AP = 02 4aj--ay(m;—8mj—1)—mj4 Tm —2 = 0 
HUI. 
— 4a? + 15a? — 12a 2 
Similarly, the ‘cubic’ moment estimator is given by 
AP) = 15a$— a(ms — 15m; + 97m4 + 2) + a (2m — 29m; + 100m — 28) 
—m l4m,— 44m + 16 = 0, 


15a? — 20a +6 
— 15a* + 56a? — 48a? -- 8' 


and (N var a) = 


and for the variance (N vara)! = 


(15) 


(16) 


(17) 
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The variances of the successive estimators converge fairly rapidly to a limiting value, and 


for large r 


Niv Sp „ 2-a (2—a)r 
Ee oer (a- 1p Pi Jj a-1 U 


We intend to make further use of this illustrative example on another occasion. 

5-1. Application to the negative binomial distribution. Various methods of estimating the 
parameters have been considered by Anscombe (1950), and for three of them he gave some 
contours of large sample efficiency (see also Evans (1956) and Haldane (1941)). We consider 
the probability function 


P= a(x--1)..(x--x—1) (Aap 


s z! (A+a)*** (x, A» 0, 2 = 0,1, ..). (18) 


The orthogonal polynomials (Aitken & Gonin, 1935) are given by 


q(x) = (1—AA,[a)ete10n, (18a) 
where A, f(x) =f(a+1)-f(z), a?mz(r—1)...(x—r- 1), 
and there are the additional properties 
S = NA +a) I (185) 
=$, 
8-1 v 
go = TI h +2) = py (18c) 
Moreover if 25 = PE AG) (19a) 
oF, 
OA = E LB, q. (N). 
S p Oger) pa Y N. (J. 195 
then 4,0, = pL. dx B. g. AE OA (196) 
But — As La- A/ + (In (1 —AA,/a)} g,(2) 20) 
a 
= As(s— 1) % a(2) +A°8(8— 1) 4207 / 26 — ..., 
and 4,0, 2 ( I) %- I), (8> E (21) 
=0 (s=0orl). 
Similarly, using Bib. = Xa + - D/ . | (22) 
B. G. 1 (s = 1); B. = O (s+1). 
o log P, ĉlog P,\ _ 23 
Clearly from (21) and (22) «( 125 ae) =0, (23) 


so that @ and A are asymptotically uncorrelated (this was pointed out by ius n 
we mention in this connexion that it is also F aa y 
related estimates for a Neyman Type A distribution with two p ; 
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5-2. The truncated likelihood equations are (cf. (23) of M.L.) 


xss( sr) — — ( aMyz) — , — auo) O 
(AT) (a+1)® 3(A＋ 0 (* ＋ 2% 4(A--a)(a--3y9 ~~ 
(arga) |. 
ji (Na (x4-r— ip) =0 (24b) 
with ‘solution’ A= Sz|N = mi, 
. } (25) 


Considered as an equation in æ it may be shown that (245) after simplification is of degree 
2r—3, and in particular for r = 2 it corresponds to the ‘moment’ estimator. We have been 
unable to prove anything about the existence of solutions in the general case. 


a exponent 


pel —— 


2.4 6 8 10 12 14 16 18 20 22 24 26 28 30 
À mean 
Fig. 1. 98 % efficiency contours for three estimators of « for the negative binomial. 


on FD o 


For the variance of a we find (cf. M.L. (23), and Fisher (1941)) 
(N vara, 3) 1 = k(ug d- us-- ... +), (26) 
where k = A? /{2(A + x)? (a + 19), 
Uy = 22 — 1)!/(s(A -- c)*7? (a +8 — 1), 
and the asymptotic efficiency of c, , is 
20 Mas d sss, 27 
Eff. (G3) = T T ue . ( ) 
An indication of the efficiency for the ‘linear’, ‘cubic’ and ‘quintic’ estimators of a. is 
given in Figs. 1 and 2, showing respectively the 98 and 90 % contours. There is obviously 
a considerable gain if we use the ‘cubic’ as against the ‘linear’ estimator; the gain for the 
‘quintic’ as against the ‘cubic’ is not so marked, as one might expect. Of course the efficiency. 
improves for any of the estimators as « increases (A fixed) for this corresponds to an appr 
to the Poisson distribution with mean A. à 
5:3. From a practical point of view it is tedious to compute a root of (24b) if the degrees 
high, although one may be provided with a fairly good start (using an iterative process) from 
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the previous equation. To give some ides of tho weight’ of computation involved we remark 
that (using N71829 = mg) = 

NIS) = mey — (1 + a7) ml 
N71Sqy(z) = ma — 3(1 + 227!) mum + 2(1 + 227) (1-27) mb 


N7ISq(z) = ma — (1 327) mmg 6(1 + 327) (1 + 227) ml ma, en 
—3(1 + Sa) (1 + 227) (1+ a7) fy, 
9 
8 
uU 
d 6 
55 
beg 
3 
2 
1 
973 4 6 8 01 «1 1 2 2 M2» 
Fig. 2. 90 % ebene eoni Ex idle aes a a oe dba apeiro Mita 
so that the truncated likelihood equations (245) are: 
Linear A(a,)8 EA z 0, 
A 29 
Almen), A=- mb; 2 
(N var ai) = bus. 
Cubic BC EA = 0, 
By = m- mb, 3B, = 9my me ma Tm) + om mAy (295) 
B, = Gm Tm} 2m 30 Lam: 
(N var aa) -= k(us + us). 
5 
Quintic Clas) = X C$ = 0, 
s=0 
Os = m mid 
3C, = — 2mo + 12mqy m — 10i + 15m — 18i, 
6C, = 3mq Im + BME ma — 23miy — 12mg 120g (299) 


— 120m + 36mg — 66m, 
30, = 90m mi — 24mqy mq — BIG  T2maymi — 110m, - 18miy, 
6C, = 216m mg 253 f — 120miy, 


Cy = — 23mh; 
k = (N vara). = k(us + us +u). 
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If we substitute in (29) the population value of a and neglect terms involving V and hi 
powers, it may be verified that 
6A(a) = €B(a) = O) = 0. (30) 

5:4. We now consider briefly two examples: 
(i) Fisher (1941) in discussing the negative binomial distribution gives two examples 
based on tick data (p. 186). For his second example some results are summarized in the form; 


N = 82 

LATE KEN A= 6-560,9756 
9" or C (ars) 

2 1549 3-48//N 

3 1594 3.04/% / N 

4 1592 2-91//N 
oo 1-778 2-80//N 

(ii) Haldane (1941) and quoted in M.L. p. 116: 

N= 1096 

— — A= 2-156,9343 
r org G (s, .5)* 

2 10-385 83-0//N 

3 9918 82-8//N 

4 9-906 82.8/ VNV 
co 9900 82.8% N 


* Calculated for the values A = 2:157, æ = 10. 
The efficiency for œ in (i) is only 64 9% while for (ii) it is 98 9%. 
6. Remarks on the validity of the expansions. 
6:1. We now consider under what conditions it is possible to assert that AQ), + AON var b, 
as r> co, where 0, is the maximum likelihood estimator (and similarly for the other 
covariances). Basically this reduces to the validity of i 


dlog P(x; 0) dlog P(x; 0) ie 
d 06, 90, -S Ane, 


which in turn depends on that of 


Clog P(x; 0 2 

s| 00, | = Eh AN M. 
Thus we have to consider the conditions under which Parseval's formula holds for (1/P) ( 120) 
with respect to the weight function P(x; 0). Clearly we must assume the existence of $ 
moment sequence {x} so that there is a tie-up with the problem of moments and a property 
of the solutions due to M. Riesz (see, for example, Shohat & Tamarkin (1943, pp. 61-8 and 
Theorem 2-20)). Riesz has shown that if there is a unique y(x) (or what is called an extrem" 
solution) such that 


Carte i = ossa, 


then Parseval’s formula, (34 {fæ} dy (x) -X ys. (33) 


where f= [P feas) 
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holds for every f(z) of the class Z3 (thisimplies the measurability of f(z) with respect to p(z) 


and the convergence or f” (edyn) Thus in our application it is certainly necessary 
for the Stieltjes integral appearing as the first member of (32) to converge. 

6-2. We now mention briefly various criteria for deciding whether the moment problem is 
determined (see Shohat & Tamarkin (1943), pp. vii-viii, 19-22); also Kendall (1943, 
pp. 106-10)). For the Stieltjes moment problem (variate range 0 to co; yiz) constant 
elsewhere) there is a unique solution, if 


ue EL. (34) 


o T GyZ+ dad, a+ 4% % 
is such that a,» 0 and Xa, diverges. A difficulty here is that it may not be possible to find 
à comparatively simple expression for a, as is the case, for example, with Neyman's type A 
with two parameters. But for Poisson moments (mean m) it may be shown that a , = m/s! 


and ag, = (s— 1)!/m* so that Sa,  expm--m-! Bellen which clearly diverges. Similarly, 
0 
for the negative binomial (18) it appears that 
o P 1 AÀ (+A) a (1+ Aa) 2A +209) (35) 


* 


0 72 IT 27 17 z+ 1+ 
so that after an equivalence transformation we find 


* 

m $ s! (1--212) $ fa 

A "One TT area) one Itn AY 
r=0 


(36) 


which diverges since a term of the first infinite series is 00 1A. Hence if we assume the 
stochastic convergence of a, to the maximum likelihood estimator then (26) remains 
valid as r > oo. 4 4 TRUM 

We also note here that the normal distri VF 

i i ies test (Kendall, „P- . 

as may be verified by an appeal to Carleman's series nell, 1989; igs! 

6-3. It is also possible to say something about Gram-Charlier ipei ide 3 8 
a finite number of terms. Thus for a Type A series based on a normal or Type 2 H s 
density, Parseval’s theorem holds provided the frequency is always positive Lye ee 85 pi 
Shenton (1954, pp. 80-2)). Thus the expansions given by me in the efficiency 5 z m 
moments and the Gram-Charlier Type A distribution (1951) converge wi rtain para- 


meter restrictions. : . donet 
For example, for the maximum likelihood estimator d, of a, 


P(x;a4) = O(z) g(2), 
where Oa) = 1+ Hy(x)ay/4!, g(z) = exp(— , 
a=(X—m)/o, H,(x)=Hermite polynomial of degree 4, 


it turns out that if ꝙ is known, 


al «(^ an (37) 
Xv lon 
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Parseval's expansion holds and is equivalent to the convergence (as r > 00) to zero of 


LJ r 2 
[aeuo l u ef de (38) 
with 0<a,<4, where {q,(z)} is the orthogonal system with respect to g(x) C(x) (see the 1951 
paper, (35) and (36)). 

Similarly, if ) is a determined solution of a Stieltjes moment problem, then it may be 
shown that the various Parseval expansions which arise in connexion with the estimation 
of the parameters of a Gram-Charlier Type B development (consisting of a finite number of 
terms) expressible in the form 


P(x) = [may (39) 


where 7(t) is a non-negative polynomial, converge (Shenton, 1957, pp. 153-6). Thus, in 
view of the remarks in § 6-2, convergence questions arising in connexion with the estimation 
of parameters in Gram-Charlier Type B based on the Poisson and Negative Binomial (or 
geometric) distributions are readily settled. 


7. Concluding remarks. The large sample moment estimators we have introduced here 
have the unusual property that although they involve higher moments this does not imply 
larger sampling variances; on the contrary the sampling variances decrease as higher moments 
are introduced. As far as we are aware examples of this sort of behaviour are rarely met in the 
literature. It may turn out that there is a similar property for the covariances. It must be 
mentioned, however, that the property of the variances might have been anticipated when 
it is recalled that the estimators (under certain conditions) ultimately converge to maximum 
likelihood estimators. 

Our treatment here has been mainly formal and general, and covers discrete and con- 
tinuous distributions. We reserve for another occasion a discussion of the formula for the 
covariance matrix (and its relation to the derivations given by earlier writers) and remarks 
on the distributions of the moment estimators. 


Ihave to thank Mr A. Fletcher for drawing Figs. 1 and 2 and for assisting in some of the 
computations. 
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EFFECT OF NON-NORMALITY ON THE POWER 
FUNCTION OF t-TEST 


Bv A. B. L. SRIVASTAVA 
Statistical Laboratory, Indian Institute of Technology, Kharagpur, 


l. INTRODUCTION 


Student's t-statistic provides a suitable test of significance for the mean when the sample 
comes from a normal population. The power of the normal theory test has been studied by 
Neyman (1935); Neyman & Tokarska (1936) and Johnson & Welch (1939). As in many 


cases the samples appear to belong to populations other than the Gaussian, it is necessary 
to see how far the normal theory test can be assumed to be valid in controlling the Type I 
and Type II errors of inference on non-normal samples. The effect of non-normality on the 
Type I error of Student's t-test was studied experimentally by Pearson & Adyanthaya (1929) 
and theoretically by Bartlett (1935), Geary (1936, 1947) and Gayen (1949). Assuming the 
parent population to be specified by the first two terms of the Edgeworth series, Geary 
(1936) obtained the approximate distribution of ¢ for any sample size, and later this work 
was extended by Gayen (1949) by including in the distribution the effects of parental 
kurtosis Ay = 4—3 and A$ = Hi. Apart from the pioneer empirical study by Pearson & 
Adyanthaya (1929, pp. 276-80), the effect of non-normality on the Type II error (and hence 
on the power) of the t-test was first studied by Ghurye (1949). He has, however, considered 
only the effect of skewness of the population, for he started with the joint distribution of the 
mean and the variance for populations specified by the first two terms of the Edgeworth 
series (Geary, 1936). In this paper, it has been possible to study the effects of kurtosis and 
skewness of the parent population which may be assumed to cover a larger range of non- 
normality. Gayen’s (1949) formulae for the joint distribution of the sample mean and 
variance for the first four terms of the Edgeworth population have been utilized for the 
derivation of the corrective terms of the power function. 

The method followed by Ghurye for the evaluation of the corrective term of the power 
function due to A, appears to be satisfactory for derivation of the effects due to higher 
odd-order cumulants. But for those of the even-order cumulants his method does not appear 
to be useful, as Ghurye himself encountered some ‘analytic difficulties’. In this paper, by 
à different approach it has been possible to evaluate integrals involved in the power function 
due to A, and 2$. N 

The non-normal population considered here is supposed to be characterized by non-zero 
values of the standardized third and fourth cumulants. Since the effects of the higher-order 
terms depending on As, Ag, A3 4, A8. ... are assumed to be negligible, the population covered 
is only moderately non-normal. Too high values of A; and À, can also not be Deer as 
they will make f(x) negative at one or both tails, and will give rise to subsidiary modes. 
To ensure a positive definite, unimodal frequency function, A, should lie roughly between 
0 and 2-4 and A$« 0-2 (Barton & Dennis, 1952). "n x 

Also it is found possible in this paper to calculate in the non-normal case the critical region 


t I am grateful to Prof. E. S. Pearson for his kindly drawing my attention to this work. 
Biom. 45 
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corresponding to any fixed probability level, to a sufficient degree of accuracy by using 
the Cornish—Fisher (1937) technique. 


2. DERIVATION OF THE EXPRESSIONS 


Let the variate £ have mean 4, standard deviation ꝙ and the third and fourth cumulants 
Ks = Ag0? and & = A, respectively. Assume all the higher cumulants to be zero, so that, 
to the third approximation of the law of error, the frequency function of £ is 


£ (FF) «1 = ge- ebe) + 3 gee) + Ñ oe, 0 


where ø(x) = / / r) e, and g(a) denotes the vth derivative of (a). 
Let us take a sample of n independent values £1, £s ..., En from this population. Now if 
t = (S ) (i = 1,2,...,n), the joint distribution of z = Zv;[n and s = (E(z, = 
can be obtained from Gayen's (1949) formula (2-11). It is found to be 
nin 
TIB — Vn) 230-3 


965,8) = 8" exp {— 4n(s?+2)} fı + e {T3 — 3% + 382 


eee 1 gde, 
+ AE pat 3(2n + 8) 244+ ohn 0-18 + hire flu 3) 2246) 

72 3(n—1) Enn — 2) | | 2 
+ ot (nat eu eee"). b 


Suppose we have to test the hypothesis Hy (yu = Ko) or Hy (% < Ho) against the set of alter- 
natives > jj. We know that for such a set of one-sided alternatives Student's t-test is 
uniformly most powerful; therefore we find the value of Student's ratio £, say tọ, such that 
the probability of exceeding this value is a predetermined æ. Our test procedure will then 
be to reject the hypothesis H, if the observed value of (S ko) J (n — 1)/ (se) exceeds fy, and 
accept it otherwise. Now if the hypothesis 77, is not true, and has some alternative value 
/4; we shall determine what probability the test has of rejecting H, in this situation. 
This probability gives the power of the test for the particular alternative. The comple- 
mentary probability will give the probability of accepting H, when y = J, which is the 
error of the second kind. This will be obtained by integrating (2) over the region of accept- 
ance, i.e. the domain bounded by 


(£—4,)4(n—1)/(60) Sto, ie. FS Tt say, 
where Pn = (U Ho) njo. 


It may be pointed out here that if the value of t corresponding to the level of significance 
a, is taken from the usual /-tables, the actual probability of error of the first kind will not 
beg. However, if we use such a value oft, the necessary correction to a due to non-normality 
can be obtained from the results given in Gayen's (1949) paper. It will be possible to obtain 
similar corrections to the probability of error of the second kind from the results to K | 
given here. To obtain the value of tọ, which would correspond to a predetermined level 
significance æ in the non-normal case, we have to take recourse to the well-known Cornish- 
Fisher (1937) technique. 
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We can write the probability of the error of the second kind as 


Pallom—lsp,)= [^ f? qz ara. 


— Pit, n — Lp) * AP, (to. We LPa) 
* AP n — 1,p,) Hin- 1, py). (3) 


In this, F(t, — 1, p,) is the probability corresponding to the normal population, which 
has been dealt in the papers of Neyman (1935), Neyman & Tokarska (1936) and others. 
The power function is given by the expression 1- Hilton 1, %) of which the term 
| — Toto, n — 1, p,) is the ‘normal theory’ power, and H, A. H. M are the correc- 
tion terms for non-normality. 

We shall now derive the expressions for H, P, and P,; considering each of them separately. 
In what follows, we shall write 


ü Pat nin 
ME Bai 1: m —L8 be A - — - F 
4 ay(n—1)’ An IIA — ) 2k 
First of all, we have 
(n — 1)! elei? 


P, (to, n—l,p,)-— ern- 1)] (nm) Qhn-2) qnt1 


x [nee +2) Hh,(— 5) — 2a*bHh, ,(— 5) + 2 (pl —3n +2) Hh, ( 550 ; (4) 
where Hh,(—b) ee, 


This is Ghurye's (1949) result, expressed in terms of the well-known Hh-function, tables of 
which have been provided by Airey (1931). 
We now proceed to derive P,,, which is given by 


w 
n 


Pillon- ps) = f, eren — 2n + 2n) zo) de 


‘ 24 Jo 
(n— 4, nstgn—: — ct (n-1) 
en 8 [n ^ oP ni jg ds 


. say, (5) 
where (2%) = fe e-z dz, 


On substituting for a in Z, and reducing, we find 
(le-, [rs Dt e i 5 


* — aninga Vn -1) 
- saa +1) py Hl - D) -n 1) HB, (0) 
Ohne CE —b)|. (6) 
b On, a) 
The second integral /, can be written as 
—1)A,[ 1 2 (n— 1) 1 
y^ = site [ss I= nt n V. ( ) 
where pu = [tme ds, (8) 


27-2 
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On changing the order of integration, after suitable transformation of variables in y... 
we find 
e- doi 


Yn = Tl f expt- ia +#/(n—1)}8? + stp, Jn] (n — 1)] e" da dt 
e o f (Pn Am)r eo 


exp [—4n(1 - /n 1)) s?] sn" qs qt 


VI) b III) Jo 
. c , mno Dik (m+r+2)) fe It / Vn Ir di 
i r! nma -o [1 4-&/(n — 1) 
e- don Zim- (m 4- 1)] 2 2 IIA I)] cee r+l m+ 
zi qim > rl [ ur 1 ( CARO )| , 0) 


where w = t$/(f4 4- m 1) and I, (p, q) denotes the incomplete h. function. 
On substituting for , Yn and Ypa in I, and using the recurrence relations of the 
incomplete Z-function, we get 


fy = Mae teh uie 4, & (p, Ju) gin 
: Spin +1) rað rl 


FEQ 4 7)] [(m +r) 0 — up) — (n+) 


(-e Ader (s —1)1 4,4, LONE a? 10) 
- eran. he- m, so]. 
since n! Hh, (— b) can be expanded as 
e y EE IAG rA I) l. 
r=0 T 
Substituting the formulae (6) and (10) for 7, and J, we finally get 
T — (n. — I) le- DU CAS 1) a?-+2(n+4)} Hh, (9) 
P, (h.m bir mer s prd Vincit )a?+2(n+4)} Why sy 
3a? 
ate , Hh —b)— DGA NEUN 
3 
tan DOA On+8) B, 4-5) Y qa 


Lastly, we derive the expression for Bz in the same way as that for H.. We have after 
some simplification as in the case of P,,, the expression 
(n. — I) He- 


Pra(to; m= l, Pn) icd 72T[3(n— 1)] 21-2, /n ania 


C 1) (n +2) (n + 3) (a? + 2)? 


(n 
x ap, Hin al b) + {—2(n + 1) (ba? +4) p? + 2(n + 1) (3n.4- 2) a? 
+4(n+1)(3n+8)— 10 , Hs -0) bel 
— 6(3n ＋ 2) a — 24) 4, Hh, ( — b) +{— 58 + 6(3n + 2) p2—3n(3n-+4)} 


+ 6(02— A) c Hil —b) + (n + 1) (m+ 2) (a? 2) (5a? — 2) 


x weet Hh, ,(— b) + (pd —2(3n + 2) p3, + 3(3n?+ 6n—4)} 


12 
x 80 10 HÀ, 4(— »]. (12) 
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3. TABULATION OF POWER FUNCTIONS 


Using the expressions obtained above, some values of P,,, P, and I; have been calculated 
and tabulated in Table 1, for æ = 0-05, n = 5, 10, 20 and integral values of p, from 1 to 5. 
The use of recurrence relation of the Hl. functions viz. 


nHh,(—b) = bHh, ,(— b)+ Hh, 4(—b) 


to reduce (11) and (12) to expressions involving only HÀ, ,(—5) and HÀ, ,( — b), was found 
convenient for tabulation purposes. The values of t, used were obtained from the usual 
tables corresponding to the upper tail area equal to 0-05. The accuracy of the results is 
conditioned by that of the values of t, obtained from the table. The results are expected 
to be correct to four places of decimals. 


Table 1. Giving the values of B, P,, and Py; for a = 0-05 


| Py, Py, Py P, | Py, | Py 
nu] 
| =0 Pa=l 


Pa 
5 0-0343 0-0030 ~0-0140 0-0075 | —0-0164 0-0165 
10 0297 -0009 — -0080 0628 — 0137 0101 
20 -0229 0002 — 0042 0450 — -0088 -0104 
| 9.2 
* aa | 
5 0-0361 — 0-0380 0-0438 — 0-0559 —0-0131 
10 0046 — 0207 0180 — 0597 0024 
20 — 0048 — -0099 0066 — -0445 0030 
| | 
Pr=4 p.75 
I 
| ^38 — 0:0649 0-0143 —0-0281 — 0-0260 0-0112 
| 10 — 0340 -0089 — 0048 — -0062 0025 
20 — 0191 0040 — -0003 — -0025 0007 
L 


4. DETERMINATION OF THE ORITICAL REGION CORRESPONDING 
TO A PREDETERMINED LEVEL OF SIGNIFICANCE 


We know from the paper of Cornish & Fisher (1937) that if z, be a deviate corresponding to 
some assigned probability level, say æ, and has a distribution with known cumulants 
Kıs Ko, ... and æ, be the normal deviate for the same probability level, then 


(13) 


Za— K. K Mana: mor M reus ryan 
t= ntad ta Bra) 36x3 2 br) 


aKa 
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Now the cumulants of the f. distribution based on samples of size n for our present 
normal population, up to O(n-?), derived from the formulae given by Geary (1947), are 


as follows: A 3 
Kms 33- (1 +Š) , 
2,/n 4n 


2 2 
* = Leg OE „A - 840, 
ye ae 
usta) 
rae = (6— 22,4 123) +5 (18— 6A, +2522). 


If we denote by ti, the percentage point of the above t-distribution, and write y, — kglid 
and y, = KE, then retaining the first four terms of (13) we get 


ta — Kı 
V2 


Also, for n » 5, we have for the cumulants of the ordinary t- distribution with n— 1 degrees 
of freedom, x 
n=l * 6(n — 1) 


n—3' "5^ (a—39(n—5) 


Hence, if t, be 100% % point of the t-distribution, we shall have from (13), for n.» 5 


= GE vids — 1) + ge Va(Ta— 32.) — du yt 222 — 5). (14) 


* 


K = O, xt = 


(15) 


t, L 1 D 
J(n-1)(—3)) ~ “= Faas) (x3 —32,). 
Now from (14) and (15), we get 


ee tele We- Eus 
aim e- 90-5 m D a2 — 32.) 38 (208 —52,). (16) 


Using (16), it is found that the upper 5 95 point of the t-distribution for n— 1 = 9 degrees 
of freedom, corresponding to a non-normal population having A, = 0:6, A, = 0-4 is 1-6266. 
On actual integration it gives a = 0-0509. This shows that the above method yields results 
to a sufficient degree of accuracy. 


5. DISCUSSION OF RESULTS 


Ghurye (1949) has provided some illustrative examples to show the effect of Aj on power - 
corresponding to significance levels a = 0-01 and x = 0-05. Our tabled values of B. in 
Table 1 for « = 0-05 being calculated from a basically identical formula, are in agreement 
with his results. So far as A, is concerned, we thus arrive at the same conclusions. 

Tt appears from a comparative study of , H, and I of Table 1 that the effect of Ay 
on power of the t-test is, in general, greater than that of A, or Ag, but to study the actual 
contributions of As, A, and A2 we shall consider some power curves for different non-normal 
populations sampled. 

The values of the power function are shown in Table 2 below for a sample of size 10 from, 
a few non-normal populations specified by different values of À, and A, mostly within. 
Barton & Dennis (1952) limits, so that the frequency functions are positive definite and 
unimodal. The critical regions are erroneously given by the upper 5 % value of the ordinary | 
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t-distribution. The exact probability of the Type I error, as obtained on applying corrections 
due to non-normality are those shown against p, = 0 in each case. 


Table 2. Giving the values of power for different non-normal 
populations when n = 10 and = 0-05 


A: 
Ay Pn — J-i — — H 
| | 
-0-6 | -04 | -02 oo | o? 0-4 0-6 
— — 222 EL » E 
-10 | 0 | 0072 | 0004'| 0057 | 0-051 | 0-045 | 0040 | 0-036 
1 254 | -245 | -234 222 200 | -195 -179 
2 -556 | -559 | -559 559 58 555 550 
3 842 -850 859 | 870 -883 | -898 914 
4 -969 :975 -981 -988 -995 — — 
00 | 0 | 0-071 | 0-063 | 0-056 | 0.050 | 0-044 | 0-039 | 0-035 
1 | 268 +259 +248 -236 +223 -208 +193 
2 -576 -579 | 580 -580 -578 515 571 | 
| 3 | -839 847 -857 -868 -881 $95 | -9 | 
| 4 | -960 -966 972 979 -986 993 — 
| | 
L0 | 0 | 0-070 | 0-062 | 0055 | 0-049 | 0.043 | 0-038 | 0-034 | 
1 282 272 -262 -250 237 22² -206 
2 597 -600 601 -601 -599 -596 -591 
3 -837 -845 “854 866 878 +893 -909 
4 951 | 957 -963 -970 -977 -985 -992 
| | | | | 
20 | 0 | 0-009 | 0-061 | 0-054 | 0:048 | 0-043 | 0038 | 0-033 | 
1 -295 -286 -275 -263 -250 -236 -220 
2 618 -620 -622 -621 -620 -616 -612 
3 835 -843 -852 :863 -875 “891 -906 
4 | 943 948 958 961 968 976 -983 
| | l | 


From the examples considered it is clear that the effect of A, is greater on the probability 
of Type I error than that of Ay. Positive A, reduces this probability and consequently makes 
the power also less than the ‘normal theory’ power locally. But there is increase in the power 
in the region of high power. The effect due to low values of A, on the power is quite small, 
but for lepto-kurtie populations with high values of A, (say, A, 1), there is a noticeable 
increase in the power up to a certain point and then a subsequent decrease in the region of 
very high power. The effect of negative A, can very well be seen to be just the reverse. 

Comparison of the values of power function, as caleulated for a sample of size 10 from a 
population with A, = 0-6 and A, = 0-4 in the two cases (i) when the critical region is erron- 
eously based on the upper (or lower) 5% point of the normal . distribution and Gi) when it 
is based on roughly the upper (or lower) 5 % point of the t-distribution corresponding to the 
non-normal population considered (as obtained by the method of $4) is shown below in 
Table 3. 

It may be seen that in the case of critical region defined by tọ = 1:833, the power decreases 
in the region of less power as a consequence of æ being reduced from 0-050 to 0:035, while in 
the case of correct critical region the power is consistently greater than the *normal theory 
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Table 3. Showing comparison of the power when the critical regions 
are erroneously and correctly obtained for n = 10 


A. A. 0 À,2 0-6, A,— 0-4 | À, 7 0-0, A 2 0-4 
6559 [M — P — Y 
421833 | 54-1833 fo= 1-627 | t= 1.833 t2 -208 
— — | — 
0 0-050 0-035 0-051 0 0-070 
1 :236 -198 :259 ESI 273 
2 580 -579 -663 -2 of 
3 868 910 940 —3 -838 
4 


-979 -997 996 | —4 -957 


power. Also a similar conclusion about the increase of power in the region of less power 
when p, is negative and f, = — 1-833, can be easily drawn. Thus the behaviour of the power 
function in the immediate neighbourhood of the null-hypothesis is found to be much in- 
fluenced by the choice of erroneous critical region on the assumption of normality of the 
parent population. 


6. CONCLUSION 


Obtaining the expression for the power function of Student’s t-test for samples drawn from 
non-normal populations represented by the first four terms of the Edgeworth series, we have 
considered some numerical examples and derived conclusions about the nature of effects 
of parental skewness and kurtosis on the power of the t-test. Critical regions based on the 
assumption of normality of the parent population have been considered, which though 
erroneous have helped in bringing out how the power curve is distorted when the usual 
t-test is applied to a sample from a population of known skewness and kurtosis. 

The present work, it may be noted, has aimed at providing answers only to such questions 
as ‘how much non-normality can be allowed in a near-normal population without seriously, 
affecting the significance level or the power of Student's t-test’ and ‘what types of effect 
there will be in different non-normal situations’. For the derived formulae are applicable 
only when the values of A, and A, of the sampled populations are known a priori, but the 
situations seem to be rare in practice when A, and A, (and not A) are known. | 

The study on the whole shows that for practical purposes, the power of the t-test is not 
seriously invalidated even if the samples are from considerably non-normal populations. 
Considering the whole range of the values of A; and A, within Barton & Dennis (1952) 
limits, the magnitudes of the effects of A; and A, are broadly of the same order. This is 80 
because these limits permit values of A; below 0-4 or 0-5 only, while they permit the values 
of A, up to 2-4. The effect of skewness is more prominent when the kurtosis in the parent 
population is of low order, but when the population is highly lepto- or platy-kurtic, both the 
effects of A, and A, become equally prominent generally. In such a situation, sometimes 
they may be nullifying each other also, for example, in the case of a leptokurtic, positively 


t When p, is negative, the hypothesis Ho (> u) is tested against the alternatives of the form n i 
and —f, the lower 100% % point, gives the critical region. In such a case it can be easily seen e 
formula for Pn remains the same as that given by (3) of $2 excepting for a change in the sign of Pa, 
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skewe! population, the power of the t-test is likely to be quite close to the ‘normal theory’ 
power in the region of low power. 

Also as is expected our results show that with increase in sample size, the effect of non- 
normality on the power of the t-test diminishes. 


I wish to acknowledge gratefully the help and guidance I have received from Dr A. K. 
Gayen in the course of my investigations. I am also indebted to Prof. E. S. Pearson for 
suggesting a number of improvements to the presentation of the paper. 
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NOTE ON MR SRIVASTAVA'S PAPER ON THE 
POWER FUNCTION OF STUDENT'S TEST 


Bv E. 8. PEARSON 
University College London 


It is perhaps of historical interest to compare Mr A. B. L. Srivastava's theory with the resulta of the 
sampling — which Mr N. K. Adyanthaya and I carried out nearly 30 years ago (Biometrika 
(1929), 21, 276-85). Tables V, VII and VIII of that paper contained the earliest comparison of power 
functions of statistical tests, although our frequencies were tabled in terms of the ‘second 8 > 
Py. The concept of ‘power’, (1—Pr), had . ⁹W uem MEM of dhe 
We were then dealing with a two-tailed test the critical region of which contained aires 
null distribution of t. However, the power function of the one-tailed test with a = 0-05 the two. 
tailed test with æ = 0-10 are very nearly the same for p,2 1, so that the following eng ae 
legitimate. Table V of our paper contains empirical values of 100P;; based on — eun random 
samples of size n — 10 from populations having Pearson-type frequency distributions wi following 
moment ratios: 


Population | A | B [^] 
As = VA, 0 0 0 0-447 (= 40-20) 0-707 [520 : 
à H= —0˙5 1-12 4-07 0-30 0. 


A is a Type II distribution, E and C are Type VII distributions, while D and E are Type III or y*- 
distributions. y : $ i 

The theoretical values of 100; in the following table have been obtained from 3 s ERS 
(3), using the values of the P-functions given in his Table 1. The observed N i rin 
from Table V of Pearson & Adyanthaya's paper. For the symmetrical 


430 Note on Mr Srivastava's paper 


Pearson Type VII curve with the same moment ratios, Nevertheless, the trend of 
quencies is in the same direction as that obtained from Srivastava's equations, 

It was, of course, never expected that these sets of 100 samples could do more than uggewi 
in which the ‘normal theory’ power curve for the single sample t-test was sensitive to, epa 
normality. It is, however, satisfactory to find that the latest piece of theoretical ork tend 
and round off this early empirical investigation. 


79-1 | 
41-9 
10-2 
0-8 
0:1 


om IS N — 


A QUICK ESTIMATE OF THE REGRESSION COEFFICIENT 


By D. E. BARTON ax» D. J. CASLEY 
University College London 
* 

1. In this note we investigate some properties of a ' quick ' estimator, b’, of the regression 
coefficient A of a variable y on a variable z. This estimate has the advantage over the least 
squares estimate that (a) it is applicable to certain types of censored data, and (5) it provides 
a consistent estimator (under certain restrictions) of the slope parameter when y and x are 
structurally, rather than regressively, related (in the sense of Kendall (1951-2) and Neyman 
& Scott (1951). It has been chiefly considered in relation to problems of this latter kind 
(Wald, 1940; Banerjee & Nair, 1942; Bartlett, 1949; Hidimoto, 1956), although it originated 
in regard to the situation where the z's were arbitrary constants (Bose, 1938; Nair & 
Shrivastava, 1942). We shall consider its behaviour when x and y are samples from a bi- 
variate population. We show that it has an efficiency of 75-80%, when the population is 
bivariate normal by means of a conditional expectation technique which enables the resulta 
of David & Johnson (1954) to be applied.* We further examine its distribution from the 


point of view of its providing a quick test of the hypothesis J = 0. 
1-1. We have a set of n independent observations of (z, y) where 


d(y|x) = a fr, " 
2, / being unknown constants. We order the z's so that x, < z,... < z, and write 


4 1 * m 12 
Te = 1 Aer 
If y; is the variable paired with z; i = 1, ...,n, we put 


k 1% 
h= T N LA 


and compute as our estimator v= 12 (2) 
Since E(B | zy, ...,2,) = g. (3) 
then C =p (4) 


so that b’ is an unbiased estimator of H. 2^» 
Comparison is made with the least squares regression coefficient b, and it will be noted at 


. once that equations (3) and (4) hold equally for this statistic. 


2. We suppose that the n bivariate observations (z,, y;) are independent, but not neces- 
sarily from the same population, and put 
o*(x,) = var (y, | xj). 
It will be noted that Brown end Mosteller (Mosteller, 1946) have considered b’ as a quick estimator 


i i and 
of the eofrelation:coefficient in a bivariate normal population in the case where the variances of z y 
are both known. 


e 


E 
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this notation indicating that o*(x;) may be a function of x, or &(z;). We then have 


k 
1 X (e*(z)) + 0*(2, I) 
var (b , ...,2,) — — 71 


2 G-ES C 
reducing when we have homoscedasticity, i.e. when S , to 
; 2g? 
var (b | Tis ases ty) = EAA 


We note that b', depending as it does on the ordered zs and not ofthe order of the y's, is 
a simple weighted mean of the js and hence the regression coefficient b (weighted if a(x) is 
not constant), is more efficient than b’ for a given set of {x;}. 

Hence in general varb’ > var b, fol 


where, if we have homoscedasticity and if 


k 

af = lim ue -/ 

n o li 1 

then lim » var b = 02/02, a 

"no 

We may thus use the variance of b as a standard of comparison, even if b is not itsel 
realizable, owing, for example, to incomplete knowledge of {x;} in the censored case. 

2-1. We now restrict ourselves to the case where the (£i, /) are taken from the sa 

bivariate population with marginal probability density function of x denoted by f(x). We 
consider first the limit as n — oo and k|n > p so that C (=) and &(a,) converge tot 


upper and lower p-tiles of the probability density function of x, say a and ag. F 
2) 2 TUE 
Then lim n var b: 217» +p (10) 
nro P C Air 
A D ia yl Ly 11 
where ji = 4l af (x) do, o2 = 4l a(x) f(x)dx, ete. (11) 
PD. Ja P. Ja 


Brown and Mosteller gave an analogous result in the normal case. 
In the homoscedastic case this gives the efficiency of b’ relative to b as 


z 
and, in fact, it may be shown strictly that e <1. A surprising result, however, is that if pis 
chosen to maximize €p, €, is not very different from unity. For instance, when x is rectangt- 
larly distributed e = 8/9, p = g. Further, we shall seein the next section that if visno. S 
distributed e = 0:8098, p = 0-2703, while €p > 0°75 for 0:167 < p < 0:397. Thus b will give? 
quite respectable quick estimate for use in situations where speed of computation com- 
pensates for loss of information. 


3. When the population from which the variables (, y;) are drawn is the bivariate normal 
population with the parameters ss Hy, Tz Fy, p), we have 4 


(3) 


c, c, 
& = My — p p. 
z z 
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_ XA -e, (14) 


7, (0 — p?) 


It is plain in this case that v 


is distributed, in a manner independently of the five parameters, as the quotient of a unit 
normal variable and an independent variable, d, say, which is the difference between the 
means of the k largest and the & least of a sample of n independent unit normal variables. 

The conditional distribution of v for a given d is normal and therefore symmetrical; the 
over-all distribution of v is therefore also symmetrical. 


4 


100 


010 015 0:20 025 030 035 % 045 050 
p 


Fig. 1. Graph of 100e,, the large-sample percentage efficiency of b’, expressed as a function of p 
where 2p is the proportion of the observations whose actual values are used in computing b'. 


We have var v = é(d—?), (15) 
36 (d- 
and its moment ratios are gi = O, 52 gus 00 


The problems which will concern us are: (1) the value of p for which the ‘large pannin 
efficiency of b“ is greatest; (2) the value of p for which the relative efficiency of b' to b is 
greatest in finite samples; (3) approximations to the distributions of v in these cases. 


3-1. Tf, following as far as possible the notation of the Introduction to Pearson & Hartley 
(1954), we write X, as the root of p = Q(X) and put Zy = Z(X,), op = „pf we have 
lim & (d) = 2¢,. 
no 


Hence (12) gives 


en = 2555, (17) 
which is a maximum when 2X, = ¢p, i.e., if po is the root of this equation, p, = 0:2703, 
% = 0:8098. A graph of e, is drawn in Fig. 1. 

+ The function ¢, is tabulated in K. Pearson (1931). 
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3:2. In general it has not been found feasible in the small sample case to evaluate 
relative efficiency of b’ to b explicitly. Denoting this by E(p,n) and using the well- 
result that varb = g3(1— p?)|{o2(n— 3)}, we see that 


j k 
~ 2(n—3) &(d-2) 


k 4 \-1 n 
N- (A 0), 


and we have had to expand this as a series 


E(p,n) 


E(p,n) = en +f, n+ O(n-), 
by a slight extension of the methods of David & Johnson (1954). Thus 


nen het) ols) 


apal n MS g) tO (ne 
where V=limnvard, B= lim nó (d — 29, ). 
no n 
M 1 1757 2B 1 
jh} 1 = 
Similarly éa a aaa 6, *o(3)]- 


Let us consider a sample of n independent unit normal variables and order them so à 
tı €. Sn. Then if z; is defined as before, E (Th | v, 4) = Rj, where Rj, = (v I) / Ov A, 


do: n6(d—24,) = nET- pp) = 2n&(R, — d.) 


Similarly, var d = ó(var (d | a, , 21.,4)] + var (& (d | an-, 41) 
2 
- iu FEX, Ry Reg) var (R= N,) 
where Ry = (NV 1) / P (2.3). 


It is then a simple exercise in David and Johnson's technique to obtain (using their table 2), 
B = XX,-$,)- 0 -p) y = -Xp bp + Xj) (24) 
p), ] p 

= (n4 1)p+O(1) as =. 

Putting the values ( 24) in (21) and the result into (19) we obtain for the second coefficient 
UA fy age c X5) 20 —p)- -A, d, + X4), b 
The maximal p may be written 


when 


1 
Pmax = Po + 9 50 (%). 
where p, maximizes e, and p, = — Sole, (the primes here denoting differentiation with 
respect to p). 


We have at the maximum 


Pmax = 02703 — 0-071271 + Q(n-3), 
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The efficiency at the optimum (or equally at p,) is 


Bax = E(Daax; u). 
i.e., E nax = 0-8098{1 — 1-0201 n7! + O(n-*)}, 


which suggests that 81n/(n + 1) will adequately describe the percentage efficiency in small 
samples. Whilst this is only a ‘first-order correction’ to the ‘large sample result’, taking it 
together with Fig. 1 we may reasonably conclude that if we make k/(n + 1) lie between one- 
sixth and one-third we shall have an efficiency of between 70 and 80 9%, for samples of ten or 
more with an optimum proportion about one-quarter for smaller samples and one-third for 


larger ones. 


3:3. If we know that c, = c, we may wish to use 5' to provide a quick test of the hypo- 
thesis p = 0, or to give confidence intervals for p and to do this it is necessary to know, at 
least approximately, the distribution of the v of equation (14). Plainly v tends to be nor- 
mally distributed for large n. For finite n we may expand / as a power series in n by 
putting the results of (24), (16) in (23). Thus 


V 1 A 
DEEP 00050). (26) 
which gives at the value py of p, 
Ba = 3(14-3:0893 n+ O(n-*)} (27) 


This result suggests that we may expect an appreciable degree of leptokurtosis for sample 
sizes in the range commonly met with and that use of (27) in conjunction with the Pearson 
Merrington table (Table 42 of Pearson & Hartley (1954)) will give an indication of the order 
of the error incurred by using the normal approximation. 
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ON THE CHOICE OF THE BEST AMONGST THREE NORMAL 
POPULATIONS WITH KNOWN VARIANCES* 


By A. ZINGER ax» J. ST-PIERRE 
University of Montreal, Canada 


1. Summary. Statistical decisions to be taken in the case of three normal populations with 
known variances are investigated in the following two situations: 

(i) The problem of selecting the population with the largest mean (the case of the smallest 
is obviously similar), given a probability of taking a wrong decision. 

(ii) The problem of determining the smallest sample sizes required to detect the popula- 
tion with largest mean, in the case of some given values for the non-centrality parameters. 
These sample sizes depend on the probabilities of good and wrong decisions. 

The statistic used in connexion with these problems is the standardized difference between 
the two largest sample means. 


2. Introduction. In many instances, the various criteria used for testing equality of 
means do not give pertinent answers to the questions that the experimenter has in mind; 
in fact he knows that the population means are not equal. He may be interested, for example, 
in detecting the ‘best’ population (the population with largest mean). 

Several authors give tests intended to be sensitive to a single outlying mean. Irwin 
(19255) proposed the standardized difference between the two largest observations as a test 
criterion, and he earlier (1925a) gave exact and approximate results for the null distribution 
of his statistic. An exact formulation of the null distribution is also given by St-Pierre & 
Zinger (1956). McKay (1935) proposed the standardized difference between the largest 
observation and the sample mean as a test criterion, and he obtained its null distribution, 
which has been further considered, together with that of the studentizedT difference, by 
Pearson & Chandra Sekar (1936) and Nair (1948). None of these authors studied the power 
of their test criteria. 

The present paper takes Irwin’s statistic for three sample means, and uses it in a pro- 
cedure for detecting the largest mean, as first proposed by Bose & St-Pierre (1954). The 
non-null distribution is also derived. The design-of-experiment aspect of this problem was 
considered by Bechhofer (1954), and his solution is improved. 


3. Non-null distribution of the standardized difference between the two largest sample men 
Let us consider three normal populations with unknown means j; and known variances 


01, i = 0,1, 2. From the ith population, a sample of n; values % 1 = 0, 1, 2;j = l, n, My 8 
drawn. Let %; be the sample mean associated with the population having mean . Lei 
To > Ta) > A be the ordered sample means; the event T; = , will of course be neglected. 
Consider 4o > 4) > /, the ordered unknown means. It is not known which population I$ 


* Work done under the sponsorship of the National Research Council of Canada, being part ofa 
thesis by A. Zinger, written under the direction of J. St-Pierre and submitted to the University 
Montreal in partial fulfilment of the requirements for the Ph.D. degree. u- 

T For a ‘studentized’ difference, an estimate of variance is introduced in place of a known pop 
lation value. 
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associated with /45. Only the following six mutually exclusive events are possible: 
(Zo. Tun F) comes from the populations with means (sy, J g. Aw), (je E = 0,1,2. It 
follows that the joint density of Zi, %), £y is given by 

(nonni , 1 [rgo nr Hp p) ita e 
— 72 -- | + EE M +. LE alt 
minan PU 3[ A ei vi 
where Z^ means summation over all permutations of i,j, k. The test which we shall consider 
assumes that the values of v, v, and v, are known and that the sample sizes are chosen so 
that n = c? (i = 0,1, 2). We then derive the distribution of 
u = (1g /, 

and write y = (hiw Hw), 8 = G - ẽcL. 
Changing the variables to u, (H N /) and (%»+Zy+%»)/(e/3), and inte- 
grating out the last, one gets 

du, , d) = glu, y, 8) + w(u, y, 9), (1) 


where glu, y, 8) = 5 [oa itor ote A(t) dt + eco" A(t) a ` 


1 2 o 
: e- t) dt Aem | dt 
win vr 8) 2 | 1 F mi ER ) 


pedum? f N (t) dt + eMe? goat, 
(u+2y+8)/ v6 (u2y44)/ v6 
where ¢ = DN Note that g(u,y,8) comes from the terms in FH, 2), Te), where 
i = 0, and w(u, y, ô) from the remainder. 
4. Integration of components of the density of u. Let us define 


Wik, y,8) = fomno and G0 7. 0 [oni 


These functions have been evaluated as follows 
(i) MC, 0,6): by numerical integration of w(u, O, ô). 
(ii) W(k, y, 6): using the relation 
W(k, y,8) = W(k-- y, 0, 4-9) d, 0, y) - Wlk+ 2y 4 9, 0, 0). 
(ii) G(k, y, ô): using 
G(k,y,9) = W(k—y,0,9)-3W(k—y--9,0,0) if kay 
and G(k,y,8) = N((y—4)/42) N — k--9)/42) OE, O, o)  3W(y -k+ 22,0, 0), 


f Go, Ta» Tw) * 


u- Y 


if , where N(x) = Í EO dt. The functions W (k, y, 6) and G. y, 6) have been tabulated 
0 
for 57 pairs of parameter values.* 


5. Detection of the best population. In order to detect the best population the following 
procedure is proposed: f Y 3 

(i) Draw n, observations from the ith population, subject to the restriction n,/o? = 1/0, 
+= 0, 1,2 

10 Let Fic > Z > F be the ordered sample means. Compute u = (To 00 /G. 

100 If fie % deside tune Z comes from the population with mean o, i.e. from the best 
population; if u < k, do not take the above decision. 

* Tables can be obtained from the authors upon request. 
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Since the statistic used was first introduced by Irwin (19255), this procedure will hence. 
forth be referred to as the I- procedure. 

The critical value k is to be chosen in such a way that the probability of a wrong choice is 
at most equal to a number æ (0<a <1) given in advance. « will be called the level. A wrong 
decision is taken when w 2 h and To) comes from the population with mean Hq) OF fig. It is 
readily seen, (1), that i 


Pr{u>k and % comes from jq or Ig) = W(k,y, o) «a. 


Let us now define the least favourable configuration. The configuration (y,, do) is the least 
favourable for a given value of k if, for all y and d, 


Wk, y, o) & W(k, Yo, ôo). 


| 


In general, the least favourable configuration is independent of the critical value. In the | 
present case the least favourable configuration is not unique. It is a function of the critical 
value ķ (or of the level «). Some calculations, based on the tables, show that: 

(i) for k < 0-856 (æ > 0-2726) the least favourable configuration is y, = 0, à, = 0; 

(ii) for k> 0-856 (æ< 02726) the least favourable configuration is yy = 0, dy = oo. 

These configurations will be referred to as the null and the pseudo-null configurations, 

The critical values k, given in Table 1, are obtained by solving the equations: 


W(k,0,0)=a, if a>0-2726, 
and W(k,0,00) =a, if a<0-2726. 


The following example shows how to apply the J-procedure. 

Consider three normal populations with variances 20, 5 and 10 respectively. Let us assume 
that the experimenter wants a level of 0-01; hence, the critical value is 3-29 (Table 1). 
Suppose that the sample sizes are, respectively, 80, 20 and 40, so that e = 0-5. Let 15:1, 
14-0 and 16-9 be the sample means. It now follows that u = 3-6; consequently, the experi- 
menter chooses the third population as the best. 

The experimenter may be interested in the power (probability of a good decision) of the 
test. 

The probability of a good choice is given by G(k, y, ô), since a good decision is made when 
u> k and % comes from the population with mean zy. The following relations are found to 
be helpful in the calculation of the power of the test 


G(k, y, 6) = G(k—y,0,8), if , 
and G(k,y,8) = G(O—k+y,C,6), if k<y, 


where C is conveniently chosen in order to use the tables. 

Whenever additional information is available about the non-centrality parameters, the 
proposed procedure may be improved by selecting the appropriate least favourable 
configuration, 


When it is known that à € A and & < 0-2726 
(a) the least favourable configuration is (0, 0) and the critical value is ky if 


a= W(k,,0,0)> W(k,,0, A); 
(b) the least favourable configuration is (0, A) and the critical value is ks, if 


a = (K., O, A) W(k,, 0, 0). 
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Table 1. Critical values for the I-procedure 


0-00 
0-05 
0-10 
0-15 
0-20 


0:25 
0-30 
0-35 
0-40 
0:45 


0-50 
0-55 


0-60 


For example, let us assume that æ = 0:05 and à € 2. The solution of 0-05 = W(k,, 0, O) and 
0-05 = (Ka, 0, 2) are, respectively, k, = 1:96 and k, = 2-20. It then follows that the least 
favourable configuration is (0, 2) and the critical value is 2-20. In the absence of information 


on à the critical value is 2-326. 
3 aot interest is that of testing for outliers. In that case 


A particular case which is of great : 
A = O and the least favourable configuration is (0,0). The most common values of a and k are: 


In order to appreciate more clearly the working of the test and 
alternative procedure which might be based ona test 
out the investigation, the results of which are 


28-2 


6. Numerical investigation. 
to make some comparison with an à 
given by MeKay (1935), we have carried 


e. D. e CE LEN RS pre Se Ore RR 


Probability of decision 
I-procedure M-procedure 
Parameter 
M Level — 00 
"Theoretical Empirical Empirical 
G W G W G 
y à a 
(1) (2) (3) (4) (5) (6) (7) (8) 
0:2 0-2 0-240 0-164 0-185 0:156 0-186 0-102 
0:079 0-039 0-035 0-032 0-036 0-041 
0-017 0-005 0-004 0-009 0-001 0-012 
0:2 0:4 0-240 0-181 0:177 0-178 0:173 0-124 0-125 
0-079 0-045 0-035 0-040 0-036 0-047 0-040 
0:017 0-007 0-004 0-009 0-002 0-015 0-003 
0-2 2-0 0-240 0-270 | (0-189) | 0:270 0-191 0-304 0-250 
0-079 0-089 | (0-052) | 0-080 0-052 0-145 0-113 
0:017 0-018 | (0-009) | 0-017 0-007 0-050 0-044 
0:4 0-2 0-240 0-205 0-157 0-198 0:151 0-143 0-106 
0-079 0-054 0-028 0-051 0-027 0-054 0-033 
0-017 0-008 0-003 0-013 0-001 0-019 0-002 
0-4 0-4 0-240 0-225 0-149 0-216 0-143 0-162 0-102 
0-079 0-062 0-027 0-058 0-026 0-062 0-033 
0:017 0-010 0-003 0-015 0-002 0-021 0-002 
0-4 1-0 0-240 0-275 0-144 0-262 0:147 0-244 0-124 
0:079 0-086 0-031 0-081 0-029 0-100 0-050 
0:017 0-016 0-002 0-017 0-003 0-037 0-006 
04 2-0 0-240 0:321 0:155 0-314 0-160 0-372 0-215 
0-079 0:115 0-040 0-106 0:040 0-201 0-100 
0:017 0-026 0-006 0-026 0-005 0:075 0-030 
1-0 0:0 0-240 0:333 0:094 0-318 0-092 0-277 0-069 
0:079 0:113 0-014 0-102 0-012 0:112 0-012 
0:017 0-023 0-001 0-024 0-000 0-039 0:002 
1:0 1:0 0:240 0:443 0:074 0-440 0:072 0:424 0-079 
0-079 0:183 0-013 0-172 0:013 0.229 0-028 
0-017 0-047 0-001 0-044 0-001 0-084 0-002 
L0 2-0 0-240 0-488 | (0-072) | 0-490 0-074 0-574 0:135 
0-079 0224 | (0-015) | 0-220 0:017 0-369 0-062 
0-017 0:067 | (0-002) | 0-060 0-001 0:163 0-015 
2-0 0-0 0-240 0-634 0:025 0-630 0-020 0-596 0:018 
0-079 0:333 0-003 0:319 0-002 0-371 0-002 
0:017 0:113 0-000 0-102 0-000 0-164 0-001 
2:0 2:0 0-240 | (0.756) | (0-017) | 0-758 0-017 0-839 
0-079 0-488 | (0-002) | 0-492 0-001 0-698 
0-017 0224 | (0-000) | 0-219 0-001 0-465 


Theoretical probabilities in parentheses were obtained by interpolation. 
G stands for good decision, W for wrong decision. 
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summarized in Table 2. In the first place we have calculated the exact values of the prob- 
abilities of good and wrong decisions, following the I- procedure, i.e. G(k, y, 8) and W(k, y, 4) 
(columns 4 and 5) for the various combinations of y and 4 shown in columns 1 and 2. 

Three critical values k have been taken namely 1-0, 2-0 and 3-0, corresponding as seen from 
Table 1 to levels for æ of 0-2398, 0-0786 and 0-0169, respectively. The following points may 
be noted: 

(a) For given æ and y, & increases with ô, but assuming that in practice the statistician 
would not be prepared to take a as large as 0-24, y must be fairly large (1-0 or 2-0) before 
there is an appreciable chance of reaching a good decision. 

(b) For the cases considered, values of W approach nearest to those of a when y = 0:2, 
à — 2-0, i.e. in the situation approaching nearest to the most unfavourable one. 

(c) Generally, W is far less than a, confirming the statement in the preceding section 
that if information is available about y or 6, the test could be made more sensitive by 
reducing k. 

We also carried out the following empirical investigation for the twelve pairs of (y, ô) 
values listed in Table 2. For this purpose 1000 initial triplets (x, y, a) of normal values, with 
mean 2 and variance unity, have been chosen from Dixon & Massey (1951). The samples 
were unexceptional since the respective means for x, y and z are 2-027, 2-025 and 2-025. Each 
initial triplet was used to define twelve triplets, corresponding to the following parameter 
values: 

E y:02 02 02 04 04 04 O4 1 1 123 
6:02 0.4 20 02 04 10 200 1 2 0 2 


For example, the first initial triplet (2-422, 0-694, 1-875) generates the triplet (2-822, 0-894, 
1-875), which is a sample from populations with means in accordance with the non-centrality 
parameters y = 0-2 and à = 0-2. In a similar way eleven other triplets are generated. 

The resulting numbers of good and wrong decisions, expressed as proportions of 1000, are 
given in columns 6 and 7 of the table. They are in satisfactory agreement with the theoretical 
values. It is now possible to use these empirical data to make some comparisons with a pro- 
cedure based on McKay’s test. 


7. Procedure based on difference between largest sample mean and grand mean. McKay (1935) 
considers a test for outliers based on a statistic which we may call v = ( — z)/c, where Xo) 
is the largest (or smallest) observation in a sample of size n from a normal population and 
T is the sample mean. He shows that approximately for small values of a“, 


i „ 4 qi. 
e 77) RE. 


Applied to the present problem, n = 3 and v = (2%) — Tu — 400 /(30). MoKay’s null hypo- 
thesis H, is that y = à — 0, i.e. that Jg = Aw = He: although he did not consider the 
alternatives to In nor the power of the test, it seems likely that it would be most cea in 
cases where y » 0, à — 0. Using McKay’s test in the sense intended, if H, is rejected when 
v>h, a ‘wrong’ decision is taken when in fact y = ô = 0. On the other hand, for the I ee 
a ‘wrong’ decision is taken when we conclude that To comes from the population Es 
largest mean when in fact it comes from one of the populations with mean /4 OT V whic 


are <j). Thus the levels « and a have not the same interpretation. 
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If in spite of this fact, we choose k and h so that æ = a’, we find 


| 
Modified McKay 
I-procedure or M-procedure Level, ~ =a’ 
(k) (h) 
1-000 1:146 0.2398 
2-000 1-584 | 0-07865 
3-000 | 2-068 | 0-01695 


Calculating v = (25% — Tu — 2:)/(30) for the same twelve sets of 1000 samples as we have 
used for u, we obtain the proportions of good and wrong decisions shown in columns 8 and4 
of Table 2. Since the twelve generated triplets are not independent, neither the values ofu- 
nor those of v form a random sample; however, this does not invalidate a comparison of the 
two procedures since they are applied to the same data. ! 

It will be seen that in certain cases, e.g. when y = 0-2, 6 = 2-0and y = 04, 0 = 2:0, W>a' 
for the M-procedure. There is no reason why this should not be so since a is the risk of wrong 
decision when y = à = 0, not the risk in a more unfavourable case with ô> 0. Although we 
have not explored the matter theoretically, it appears likely that when ) is small and 
ô large, v may be large because Zp is much less than To and Fa, not because a is excep- 
tionally large. In fact, McKay’s test should in this situation be first used to establish that 
9 0, assuming y = 0 and, afterwards (having discarded d;), to compare zi and Zp. 

Some further investigation was carried out in the cases y — 1, ô= l; y=1,0=% 
y = 2, ô = 2 for which Table 2 shows that the empirical values W, > W, and Gy Gr. The 
critical values for the I- procedure were modified from k z to kj in such a way that the new 
probability of a wrong decision W is equal to Wy. This was done by ordering all the observed 
u values (u, > us >...) that could lead to a wrong decision, and taking 


ky = 4 (000074 + opor 741)- 


Good decisions with V as critical value were enumerated to give Gj. Results are shown in 
Table 3. 


Table 3. Comparison between the two procedures in the cases 
5 =I, d =I; = 1, 02 and y = 2, 6 2 


y à k Wi=Wy G; Gy 

1-0 1:0 0:972 0-079 0-447 0:424 
1:512 0-028 0-290 0.229 

2-555 0-002 0:087 0-084 

| 

1:0 2-0 0-528 0-135 0-621 | 0574 
1:149 0-062 0-448 0-369 

2-047 0-015 0-210 0:163 

20 2-0 0-331 0048 | ~~ 0-882 0-839 
| 0-805 0-020 0:800 | 0-698 

| 1783 | 0-002 0-547 0-465 
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Although clearly the tests could not be brought into line in this way in practice, as 
a matter of general interest we are now able to compare the probabilities of the two pro- 
cedures producing ‘good’ decisions when adjusted empirically so that the probabilities of 
‘wrong’ decisions are the same. It will be seen that G7 is now always greater than Gy, 
although there must clearly have been considerable lack of precision in determining the k 
values, particularly when Wy, was small. 

A special comparison between the two procedures was made in the case of the detection 
of one outlying population. In that case, both critical values were changed so that 
V = W'u = max (N, Wy). The new critical values are given by 


*. 3M tsooorr* 7000 1). i I. M. 
Results are shown in Table 4. This comparison may indicate that even in the detection of 


outliers Irwin's statistic is better than McKay's. However, any tentative conclusions drawn 
from such figures only serve to show that a more precise examination is desirable. 


Table 4. Comparison between the two procedures in the cases 
y= 1-0, 6 = 0 and y = 2:0, 8 0 


| 
Y LES | M | ky Wi=Wy | 
| | bare 

| 1-0 0 | 1-982 | 1597 | 0-012 

| 2-497 | 2435 | 0002 | 

| | | ! 

| 20 | 0 1876 | 1-802 | 0-002 | 

| | 2-480 2066 | 0001 | 
| 


8. Planning of an experiment to detect the best population. In the present case, the planning 
of an experiment is conditioned by an assumption about the value of the difference Jy — Hu. 
The problem is to find the smallest sample sizes no, % = 1901/00 and nj = noi such that, 
by applying the I. procedure, the probability of taking a wrong decision is <a (0< a<1)and 
the probability of taking a good decision is 2 1— £ (æ < f. «1) if jug — 14) 2 M is true. The 
critical value k and the sample sizes n; are to be determined by the conditions: 


for all ô and for ally>M/o=T, W(k,y,9)&« and G(k,y,9)21— f. 

Let us now define the least favourable configuration. The configuration (Yo, 5) is said to be 

the least favourable if for a fixed k, for all y>T and for all ô, 
Wk, Y: à) < Wik, Yo 00) 
and Gk, y» à) 2 Gk, Yo à). 
Let us consider the dotted curve (Fig. 1) which divides the area into two regions A and B. 
Region B is not considered, since in the probability of taking a good decision is always 
less than 1/3. Some calculations show that in region A the least favourable 8 EN 
is (/o, O). It follows that the critical value k and the sample sizes ut, ? = 0, 1, 2, are 
determined by the equations 
W(kMj|o/0)- «,  G(k, Mo, O) = 1-5,  mlo}= 1o. 

Fig. 1 can be used to solve the system. 
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For example, if a = 0-05, 1—# = 0-80, o = 2, 01 = 4, 0$ = 10, and if M = 1, then. 
Fig. 1 gives M/ = 2-15 and k = 0-49; consequently, the minimum sample sizes are respec- 
tively 10(9-245), 19 (18-49) and 47 (46-225). 

Itisto be noted that the proposed procedure does not imply the necessity of always taking 
a decision. However, if æ = f, then k = 0 and a decision is always taken: this is Bechhofers 
solution (1954; p. 30, column k = 3, t = 1). 


Ni, 
W 


S 
d 
A ' 


Fig. 1. The W-curves have negative slopes. The G-curves have positive slopes. 
W(k, y, O a, G(k, y, 0) 1-H. 


9. Optimum solution. When a decision is required, it is possible to improve on Bechhofer's 
solution. This improvement is obtained by a repeated application of the procedure pro 
in $8. 

Let s — 143) > M be the assumed information. Let a and 1 — f? be, respectively, the bounds 
for the probabilities of taking a wrong and a good decision at each step of the procedure. The 
critical value k and the sample sizes n;, i = 0, 1, 2, are determined as before. If, in applying 
the I-procedure, no decision is reached, then another set of samples of sizes n;, i = 9, 1, 2,18 
drawn and the /-procedure is applied once again. This is repeated until a decision is reached. 
It 1s readily seen that, for this new procedure, the bounds for the probabilities of taking 
a wrong and a good decision are, respectively, a/(14+a—/) and (1—/)/(1+%—A)- T 
expected value of the total sample size for the ith population, i — 0, 1, 2, is given 
n,/(W +G), where W = W(k,y,8) and G = G, , d). A true bound for the expected to 
sample size for the ith population, i = 0, 1, 2, is n;/(1 — £). 
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Let us now denote by F the bound for the probability of taking a wrong decision in this 
new procedure. The problem is to determine the values a and 1 — / (or the critical value k and 
the sample sizes n,) in such a way that: (i) a/(1+a—) = F, and (ii) n,/(1— 2) or P*/(1 is 
minimized, where T = Mn as defined in §8. Since W (k, T. o) = xand G(k, P,0) = 1 — 7, 
it follows that æ and 1 — are functions of k and T. Lagrange's method gives as solution the 
roots in k and P of the system 


(IT-) = F, 
a0 (1-2) de -G 9, OR „ LI 
Wu Wem on ae D 


A solution of the above system is obtained as follows. The following functions of Æ and I’, 
F = W(k, V, 0)/{W(k, P, 0) + G(E, T, o) 

and H = T, P, 0), 

are readily obtained from the tables. A family of curves H = H(F, T) is drawn on Fig. 2 for 

certain values of T. It can be proved that the equations of the envelope of this family of 

curves are precisely the system (2). Thus, an optimum solution is given by the envelope 

drawn on Fig. 2. 
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0-01 0-05 010 045 020 
Fig. 2. Optimum solutions. 


Let us consider the following example. Suppose that F = 0-02; then me = 7-4 and 
T = 2-27, according to Fig. 2. It follows that Œ = 0-696. Using Fig. 1, it is found that 
k = 1-04and W = 0-0145. Assuming that M = 0-5, and that the variances are, respectively, 
1, 2 and 1:5, the sample sizes are 21 (20-61), 42 (41-22) and 31 (9099); The Rx for the 
expected total sample sizes are 30, 60 and 45. The sample sizes with Bechhofer's method are 
42, 85 and 64. 
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Bechhofer’s solution I*(F) (T(F) is VN A in Bechhofer's notation) is drawn on Fig. 


comparative purposes. It is readily seen that for F < 0-16 the proposed procedure 
on Bechhofer’s. 


+ 


Table 5. Comparison with Bechhofer’s procedure 


F | rye rr) | 5 
| 7o 
| 
0-10 4-5 497 | 9 
0-05 5-75 7.34 22 
0-01 8:6 13-08 34 
0-005 9-7 15-62 38 


Optimum values of h and T are given for the most common values of F: 


F | r k 
0-10 1:63 0-75 
0-05 1-92 0-89 
0-01 2-48 1:318 
0-005 2-68 1:30 


* 


In conclusion the authors wish to thank the editor and the referee for their helpful 
suggestions. 
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APPROXIMATE FORMULAE FOR THE STATISTICAL 
DISTRIBUTIONS OF EXTREME VALUES 


By J. J. DRONKERS 
Ruychrocklaan 180, The Hague 


0-1. SUMMARY 


This paper deals with the distribution function of the order statistics 
E, saam = 1,2...), My (x). 
For this distribution function of the mth largest value (if m is counted from above) approxi- 
mate formulae are derived. 
These formulae are generalizations of the corresponding approximate formulae for the 
distribution of the extremes proper. Successively the initial distributions f(x) are supposed 


to be of exponential type, Cauchy type or of limited type (finite range). 

We first deal with the basic conditions to be imposed on the initial distributions f(x). An 
expansion formula has been derived for the distribution of the excess, 1 — F(x), which plays 
an important part in the investigations of this paper. 

We then consider the general formula of the initial distribution of the mth values, 
M, m(x). Appropriate formulae have been derived to determine the mode and the maximum 
value of M, m(x). The behaviour of the maximum value by varying n and m has also been 
studied and approximate formulae for M, () are successively deduced. Every succeeding 
formula has a more restricted range of application. Finally, limiting expressions for Mun 
are deduced for the three types of initial distributions, mentioned above. The well-known 
limiting functions of Fisher-Tippett and Gumbel are deduced again. Formulae for the 
determination of the interval of application have been deduced. 


0:2. INTRODUCTION 


The present situation ofthe theory ofextreme values has been described by Gumbel (1954). 
He also gives a short history of this theory and discusses some practical problems. From 
a mathematical point of view there are two ways to deal with the problem of extreme values. 

The first one, used by Fisher & Tippett (1928) starts from the functional equation: 


F(x) = Fhap + bn), 

which has the following meaning: ] 

Assume we have N samples each of size n. From each sample the largest value is taken, 
and the maximum of the N samples of size n is the maximum in a sample of Vn. Then Fisher 
& Tippett point out that the distribution of the largest value F in a sample of size Nnis the 
same as the distribution of the largest value in a sample of size n, except for a linear trans- 
formation. Then the limiting distributions of the extreme values are determined as three 
solutions of the functional equation. Each solution corresponds to a certain type of initial 
distribution, respectively of exponential type, Cauchy type and of certain limited range 
distributions. ^ 

In a recent publication Jenkinson (1955) gives a general solution of the functional equa- 
tion, which includes all three of the Fisher-Tippett solutions. See also Gnedenko (1943). 


448 Approximate formulae for the statistical distributions of extreme values 


The second method of determining these limiting distributions has been introduced 
von Mises (1936); see also the paper of Wilks (1948). He supposes that in case x — op th 
limits of certain functions A(x) and A,(z), mentioned in § 0-4 of the present paper, exist 
take certain values. The function A(x) has been connected with initial distributions of 
exponential type and A,(«) with distributions of Cauchy type. | 

The mathematical treatment of the theory of extreme values dealt; with in this paperis 
closely related with the method of von Mises. However, we introduce a supposition which 
is different from those of von Mises and which enables us to derive approximate formulae for 
the distribution of the extreme values and for the deviations from the corresponding exact 
formulae. In comparison with the application of the limiting formulae these approximate 
formulae may be applied for smaller values of n and may therefore be called ‘transitional’ 
approximations. i 

A more general treatment has been given by deducing the formulae for the so-called mth 
values. Gumbel (1935), who also deals with this general case, considers only the initial 
distribution of exponential type. Ifm = 1, we get the formulae for the extremes themselves, 

To illustrate the application of the different approximate formulae, we take the no 
distribution. The well-known limiting distribution function of the extreme values, which 
has been derived for the exponential types, is not however related in a simple way to the 
normal distribution, unless we consider very large values of n. It appears that applications 


of the transitional approximate formulae derived in this paper are more useful for 
smaller u. 


I should like to express my thanks to the referee for his comments which gave the paperits 
final form. 


0:3. CONDITIONS 


Throughout this paper we consider populations with continuous eumulative distribution 
functions (cdf), which have at least three finite derivatives. 

A one-dimensional continuous edf F(x) with finite derivatives f(x) = F'(x),f'(x) = F(a, 
f"(«) = F" (æ) is assumed to satisfy the following conditions: 


(a) f(b) = 0, where b is the upper boundary of the distribution: 
(b) lim Bia) = 19 (0% = kexists (and also lim B(x), but in this paper mainly the upper 
rib ath ria 
boundary 6 is being considered), 


0-4, REMARKS ON THE ASSUMPTION (b) 
The value of k depends on the range of the initial distribution f(z). 
THEOREM 0-4-1. If f varies over a range with upper bound b, then 0< k « oo. If b =% 
then —1<k<0. 


We leave out of consideration the case k = o6, e.g. f(x) + — 1/log (b -). 


The value of depends on the asymptotie behaviour of f(z). The following statement can be 
made: 
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THEOREM 0-4-2. If f(x) has an infinite range and belongs to the Cauchy type,“ then 

—-1<k<0. 

If f(x) belongs to the exponential type“, k = 0. 

Let f(x) have a finite range. If we transform f(z) into f(z") according to x = 1/(6—2),x<b, 
f(x') will have an infinite range. If f(x’) belongs to the Cauchy type, k>0 for f(z), (e.g. 
f(x) = (b—2)*). 

If f(x’) belongs to the exponential type, k = 0 for f(x). 

To prove these theorems we have to put 


Bia) = (4) -&-$). tim gila) = o. 4 
5 

Then we may write f(x) ZI (7) ik f | ` (2) 
7 (y—c)— itt) dt 


Further, c has to be chosen near 5, so that 


| fea 


The conclusions follow from examination of the behaviour of the integral in (2), in con- 
nexion with supposition (b) of $ 0-3. ' 
The condition lim B(x) = 0 is closely related to the following condition which von Mises 


qo 


(1936) has set forth to derive the well-known limiting distribution of the maximum value 
when f(x) belongs to the exponential type 

.. d 1-F(z) STIR 

Acc eae) lim 

esed ß een f 

According to I' Hópital's rule, we may write further 


«e(y—c) (y»c). 


lim A,(z) = f'--L (3) 


lim 1-F Ph B(z) I, 
4 
in virtue of the existence of lim B(x). It follows that the condition of von Mises is equivalent 
ro , N 
to the supposition lim B(x) = 0 when f(x) is of exponential type. Conversely, if the existence 
4 k , 
of the limit of von Mises has been supposed, it is not necessary that 2o B(x) exists. 


In case f(a) belongs to the Cauchy type (the upper bound b is infinite), von Mises supposes 


that „ wx) 4 
1 = lim =p>0. (4) 
aa Aa legalem 
Applying l’Hôpital’s rule twice, we get 
2 im a nt? 
lim Z- [B(r) -1] = p*2. or 1 
400 B 
Since lim -f = 1-- p, we have k = —1/(p+1), —1<k<0. Also, in the case of f(x) 


le oo 


being of Cauchy type, the condition of von Mises is equivalent to assumption (b), provided 
k= —1/(p+1). 


iti i f Gumbel (1954). In con- 
* For the definition of Cauchy types and exponential types see the paper o! } ; 
tradiction to the notation of Gumbel, we call distributions, where all moments exist, of exponential 


type, e.g. f(x) 2 exp( log: x) (s> 1, x> 0). 
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0-5, APPROXIMATE FORMULAE FOR THE PROBABILITY OF THE EXCESS 1— F(z) 


The function 1 — F(x), determining the probability of x being equalled or exceeded, figures; 
the formulae of the extreme values. The following expansion for 1— F(z) is very useful fo 
the determination of approximate formulae. Define 


b oo 
so that 1— F(x) = i ft)dt = Í e 2 ay, 
* v Y 
b may be finite or infinite. 
From repeated integration by parts may be deduced 


o yde dx d*x dy dx dy\¥ E d'y 
—Y — dy en X -y 
f e a e at + ap dat tay d. Be dy > 


(6) 


supposing that the higher derivatives d*f/da* (k = 3, ..., n + 1) exist for the interval (a,b), 
Then ae f) aay p 
dy | f" dde 


| d'sdy / du - dy d. [d^—1a; w] 
Generally AT [ yi A BC) 4% | dy d 


B(x). 


dro polynomial in f(x). f'(x)... f (a) 
dy” = f(x) VUE 3 
* 2 


It is easy to prove that lim ch = En — 0 if also f'(b) = 0. 
2b dy 4 tot 
The demonstration follows from (1) after integration and by remarking that, if 
b=a, f(x) = 29, lim ( = 0 (xc). 


n. 1 M 
Further the values of lim A W may be determined, supposing the limits exist and are 
yo 


First we show that id d [dmx 2l 0 
y dy|dy"dx| ` 

. dad p: 

The existence of this limit is obvious from (7). Further, it is clear that lim = would be 


that 
d'zdy 21. dix dy 3 
madide prediis Coe 
in dady | a 
so that lim 4% ga Cn. | 
Further, we put, by introducing once more the initial distribution f(x), (see (1) if n = 2) 


= T 
dy Leb (ES (n= 2,3...) | 
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while lim ¢,,_,(z) = 0. It also follows that 
rtb 


Ae en) (gas) -o 


1— F(x) ee (Eee + Gale) 
ecran» [109.04]. a1) 


As f(t) and lim ge) = 0, the mean value theorem is applied 
415 


Hence we may write for (6) 


700g par o. -T (een. 


In the following we put y, (2) instead of — ,)). 
Finally, it may be deduced from (11) for k> —1 
` 1-(- E) (1+ E) [Gin + «+ ea 2 
1 1 E Ho as 
lim V. =0 (n>1). 


1—F(x) = 


In case k> 1 we may not consider an infinite series (n > 00) or n even if k = 1. Formula 
(12) also holds good if k = 0. 
In this paper mainly the case n = 1 is considered, so that 


1 
1—F(x) = Ee)” 5 
"me ( NE "T (im Mö) = 0; SG) = SiC). 


Remark 1. In the case of the normal distribution f(x) = (2m) H e+", for which k= 0, the 
expansion (12) is identical with the well-known asymptotic expansion, obtained by direct 
repeated partial integration, 


1 1 1.3 „1 1.3.5. A dei Me (14) 
while Vua) = 1.3....2n-1 (a < E(x) <æ). 


gur 

Tt is well known that in this case the error committed in stopping at any stage in the 

asymptotic series is less than the next term in the series. 
Remark 2. According to (3), (13) and k = 0, for the function A,(x) of von Mises may also 

be written 

A,(x) = — 1 5 
1+y(x) 
so that it appears again that the condition (3) is closely related to the condition ipa Bia) = 


(lim y(x) = 0), 


452 Approximate formulae for the statistical distributions of extreme values 


1, THE MAXIMUM VALUE OF M., (x) 


1-1. The formula for Wn (x) 
Let x denote the mth value from the top in a sample of size n. For its Probability eism 
M, m(x) dx we have | 


Mme) = n (-I) Lehle f qu) 


l 
Analogously for the probability element L d of the mth value from the bottom | 


= - 
Lanla) = n(5 1) -T H Ae D 


In case of m n we consider the frequency distribution of the (n—m)th value from the 
bottom. 


Putting m = 1, we have 

M(x) = nf (x) [F (x), (15a) 

L,(x) = nf (x) [1 — F(z)]". (150) 

For approximate evaluation of the factorial in (15) and (16) the Stirling formula may 
applied. If n, m and nm are sufficiently large we may write 


(ge i pe al E Eom 1) A m . qn, 


Um 1 1 1 | 
N exP Izn 12m 12(n—m)|* 


Ifnissufficiently large and m sufficiently small, it is preferabie to maintain (m — 1)!. Inthis 
case the approximation can be deduced by means of the Stirling formula 


n! 


! m*i—m , —2m*- 3m? — "| (18) 
(n —m)! 


= m — — 
7 [ 2n 12n? 


Here the usual expansion 


-n-) m 2 3 
(05 ep lun emm | 


293^ 30? 
has been used. f 
In this paper the further treatment of the function L, ,,(a),m < 4n, is left out of consider 
tion, because it is analogous with the one of My, (r). : 
Further initial distributions f(x) are considered satisfying the conditions of $03, supposing 
that im B(x) = k is not equal to — 1 or oo; for the rest b may be finite or infinite. 
T 


1-2. The modal value Un, m of My, n(x) tion 
The modal value x = u, „ for which M, „(x) has a maximum value, satisfies the equa 


n m m-l 


Fn) IF ) (5)... Sy. 


- ing the 
which may be found by differentiating Mau) at (15), with respect to x and equating 
result to zero. 


(19) 
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We may write for (19) 
m-l 7) 
= 0. 
IN)“ +(F TU om. (20) 
Now the following theorems can be stated: 
THEOREM 1-2-1. For every n> an assigned n, one modal value t, of the frequency 


distribution .M, m(x) lies in the interval (c, b) (c sufficiently near b). If n increases and m is 
supposed to be fixed, u increases also, while lim wu, m = b. 
oo 


Conversely, let n be fixed and m a variable, then wu, , < u 4, « u,(m = 1) for m, < ms. 
Proof. An interval c < x < b exists, for which the function Af is negative and steadily 
decreases to — oo, if x increases from c to b. Fo( J -s|(F) +1] > 0, if z is sufficiently 


large. 
It is obvious that a value n; > 0 exists, for which u = c, so that 


m-1 T 
mola gt C. -o 
Let n > n, and m fixed. Then the equation (20) has only one solution u, ,, e. For = LFU 
and (- rj steadily increases to 4-oo if x h. The statements concerning the behaviour of 
Un m by fixed n are immediately evident. 


THEOREM 1-2-2. For sufficiently large n and small m, the modal value %, satisfies 
approximately the relations 


Jy 2E D d 
(4) = ae Gr (21) 
Ta Eto G. Qu 


If f(z) is of the exponential type, the formula derived by Gumbel (1935) is found 
m 
1- Flinn) = (23) 


Proof. In virtue of (13) we may write for (20) after some calculation 


f -1  (n-Dückty()1 (24) 
(7), „ EE YU) (mn — 1) Qc EQ) 1" 


whereas = Vr) —k- 0. 


= im (H ae 
For sufficiently large n, we may write 
(4) ELM 1 1 10 eee eee yy, (m>1) 


fs IE (Lk) =k -Promy Enak- k] 
1 n(l+k)—k ^ 
Ei ari xir ue. (eta) 


n(l+k)—k 


eee =] 
TOEISEE [i-a H- wan 


2 
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As a result of eliminating (72% un, m between (13) and (20) we obtain y 


(m—1)[1+k+y(u)]+1 
(n—1)[1+k+y(u)]+1 
Qnm --Eb)-k m-n) j 
* n(L-E)-E | + makk] ii crt b 
The formulae (21) and (22) hold good if the terms with y(u) may be neglected with resp 
to the unity and | k | &n(14- Kk). 


1 Fus n) E 


2 
Remark. Withthe aid of (24) and (25) we may show ( — Ee < 0, if n is sufficient) 
Gn m 
large. Then M, n (u) is a maximum. 


1:3. The behaviour of u, m in case of increasing n or m 

We suppose n and m to be continuous real variables. Let n be sufficiently large and m fixed, 
but small with respect to n. Then we deduce from (20) by applying the formulae (13), (24 
and (25) (the index n, m has been omitted) 


du Ín(1 o b) - k} mu) n - 
40 99 miren) Bep * (sd ei * 0e) | 


= 


= G W 


If the quantity y(u) may be neglected with respect to unity for sufficiently large n an 


an juae (> h, E 
. du (f 1 a 2700 
or, according to (21), ne 3 ITU E] 0-5) i 


: — * Lv 
ex zn oF = r exist. Then we may easily show that r = 0 in two cases, namely if b is finit 
* 


and f(b) = 0, or if b = co and values s > 2 exist, so that æf (a) — 0 for x > co. Further k 
if b = oo and values s< 2 exist, so that z*f(z)-» oo for x — oo. In certain cases Uke 
Jide) = p*a-?, we have transitions, then r = tp”. Hence we may state: 


THEOREM 1-3-1. If nis sufficiently large and k> — 4 and therefore f(x) tends to zero 1 s 
rapidly than f,() = p*z-?, dujdn is a positive steadily decreasing function by inorea " 
n, while lim du/dn = 0. If -l<k< — }, so that f(x) tends to zero more slowly than filth 

n 3 h at 
du/dn is a steadily increasing function, while lim du/dn = co. If k = —} it is possible the 
n 


lim duſdn is finite or co (e.g. if f(x) = (log er- (7 0), this limit equals zero; 


n 


f(a) = DE o > 0), this limit is infinite and for f,(x), p*/(m + 1)). 


Similarly, for a fixed value of n and variable m > follows (see (20) and (25) 
du _ - e [ier eee (28) 

dm dn |(m—1)(1+k+y(u)) +1)" 
The formula for du/dn is given in (26). 
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The approximate formula for du/dm is (see also (21) and (27)) 


du T 1 
am (7). zz (n fixed, k> — 1). (29) 
It may be easily shown, that if f(z) has a finite upper bound, lim f/f’ = 0(k2 0). Further, if 
atb 


f(x) is of the Cauchy type, Jim fif! = —0(—1<k<0). Hence: 


THEOREM 1-3-2. The function dujdm is negative for u sufficiently near b. If f(x) has 
a finite upper bound, lim chasis 0 and equal — oo if f(x) is of the Cauchy type. However, in 


case f(x) is of the expt type, —oo« limdu/dm < 0. 
utb 


1-4. The maximum value of M, » (2) 
From (15), (24) and (25) it follows after some caleulation that 


My u(t) (Hau- Jl zoj [zz] aeree) (30) 
P= ( -1). 


Then f"/f is introduced instead of f in virtue of the fact that in the practical applications by 
increasing z, f’/f changes more slowly than f. 
If en, m?|n and (m—p)/n are sufficiently small we may write for (30) (see also (18)) 


4 m p m 
Mu - (5) cose E) aerem ace) (31) 


i oe P eo(£) «o(7; ;. be 
Finally if we suppose that n (and therefore u) is so large that | u) <1+k, we write 


for (31) 1 * 
Mnt = -C -aa as e 


—k?-2k—-1 
Iun CE a egg Ode). une. 


In the case of the extreme value itself (m — 1) follows from (31) 


Mu) - (4). eV + pia), (33) 
2 A 
A ng 0000 (us < €). 
If i and i may be neglected, we may write 
LA m k 
Auto -CH l- 


Musee e) e. (35) 


[oo (34) 
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1:5. Example 
As an example we determine approximate formulae for the maximum of the 


values in case of the normal distribution f(z) = (27)-te-4*. From (13) and (14) follows 


l5. 10 2.5 
dub c0. d ICE er ir iar 
so that in virtue of (35) 
1 15 
Mu) = uexp(- 12-2 ) 


. 

According to (24) the following relation exists between u and n 
15 

ty Erne o pes (38) 


To judge the accuracy, we have computed the values of u and M,(u) from the vari 
approximations which follow from (37) and (38) by omitting the smaller terms. 

For comparison the exact values of u and M,(u) have been determined in the case 
n = 306 = 3, ..., 10) from the graphs of Figs. 2 and 3. These graphs also show the behaviour. 
of M, (x) according to the exact formula (15a). 


A sufficient agreement appears to exist between approximation values of u, dete 
fons n = (21b a dw 


and the values determined from the exact graphs. 
Concerning the values of M, () computed from the various approximation formulae, 


deviations are more important. In Table 1 are mentioned the values of M, (uw) computed 
from 


3 
(a) M. (u) IU Ut: — (5) M,u) = uexp| -1+5- sl ; 


(c) M,(u) = wexp| - 1 +23 (d) M,(u) = wer. 
(e) M,(u) = wexp| - 1 +5 — ail: 


Further have been mentioned the values of the neglected term ju (2) l. 


Table 1. Normal distribution of u, M,(u) and Hg 


(b) (c) (d) 
0-71 0-91 0-69 
0-92 1-02 0:85 
1:07 1:13 0-99 
1-20 1-24 111 
1-31 1:34 1-23 
1-41 1-44 1:33 
1-51 1:53 1:44 
1-61 1-62 1-53 


J. J. DRoNKERS 457 


The values according to (d) appear to differ too much from the exact ones. A difference 
smaller than 1 % will be obtained if u > 10 and therefore n > 3. If n » 3°, we may apply (c), 
while (b) gives sufficiently accurate values for n > 3*. A further approximate formula of this 
kind, e.g. 

À “egy d 
M,,(u) = wexp| 5 


is still more unfavourable for small u, because the applied expansion has an asymptotic 
character. According to the table the formula (e) in which the coefficient of u has been 
taken appears to give the most favourable approximation for n > 3°. 

If we consider the initial distribution 


fce (c,d>0), 


for «>a, it appears that k = 0 and % =. Then u and M, (u) are determined by the 
formulae (see (24) and (34)) 


^ d e UNE VAM 
a EE and M, au) = di ji ^ 


provided jt ~ m/(2n) is sufficiently small with respect to unity, see (33). 
1-6. The behaviour of the maximum value of M., (x) for increasing n or m 
At first we suppose m fixed, n variable and sufficiently large and m small with respect to n, 


so that (34) may be applied. 
Then we write shortly 


D 


A. ULE () gn, ). gem, so (m2lk»-1) (39) 
In virtue of the behaviour of f’/f as x increases, we may state: 


THEOREM 1-6-1. If f(x) has a finite upper bound, lim My, lu) = oo (Ez 0). If f(x) is of the 
u 


Cauchy type, this limit is zero (—1<k<0). However, if k = 0, and therefore fæ) is of the 
exponential type, lim M, »(w) = zero, finite or infinite, depending on f(x). In this case: 
uo 


(a) Let f(x) tend to zero as rapidly as, or more slowly than 
€" (r<l,a>0). 
Then lim // = 0 and therefore lim M = 0. 
=> o uo 


(b) Let f(x) tend to zero as rapidly as, or more rapidly than 


e" (r>l,a>0). 
Then lim f'|f = co and lim. M, () = oo. 
(c) Efe tends to ud like F — Imne-^ 
exp {lx + Q(z)) (l< 0, lim C (e = 0), 2m $ =l and Jim M, mlt) ET) 
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From (15), (19) and (25) follows for variable n and m fixed 
dln M(u) d —1 | dn 
gu un = [rti ao "(7 iei du 

EE plu) 50 0) dn 
n(l+k) 2n?(1+k)? — [n(1+hk)—k](1+h) n |du' ( 

while dn/du has been determined in (26). Then we have also applied the Stirling asym 

series for the expansion of Inn! and the well-known expansion for 


l+y, Wie es IA 
teni u -24ü-—F) 


In F = In 


Further let | k| <n(1+h). i 
From (21), (26), (39) and (40) follows immediately: 


THEOREM 1:6:2. (a) Let f(x) have a finite range (k> 0). Then for large n and hence w 


sufficiently near b dM ^2 
Genes 
. aM (u) i 
whereas for n — oo and thus u ^ b, lim 5 (we have omitted the index , m). 
utb 
(b) 1f f(x) belongs to the Cauchy type (- 1 < k < 0), then 
Mu) «0 and lim — — DH =0 
du uo du ; 
(c) If f(x) belongs to the exponential type (k = 0), in virtue of (21), (26), (34) and ( 
follows d a at f men 
«ree wmm. e 


For the normal distribution it appears that for large values of u 
d. M (u) e 
du (m—1)!* 
Then in virtue of (36) we have put y(u) ~u. This result may also be derived from (34). 
Analogously we may formulate a similar theorem about the behaviour of My, ,,(u) if mis 
variable and v is given, using the following formula (n is fixed and m > 2 


(res) -rf 


The formula for dm/du is given in (28). 


2. APPROXIMATE FORMULAE FOR M. (2) 
2-1-0. The first approximate formula 


From a practical point of view this formula may already be applied for values of n, which 
are not large (e.g. n= 50). 
Instead of (15) we write 


M, ma) =n (on) f(a) [1 F(x)" exp (- (n—m+1) [1- F(@)}} 1 EG 
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If | /(z,n —m 1)| «e for b» z» 2, nn (the upper bound may be finite or infinite), we 
obtain the approximate formula 


My ue) en (-I feti expt- (n-m+ DU- Fre). (44) 
With m = 1 the corresponding expression for the initial distribution of the extreme value is 
M, (27) =nf(x) exp {—n[1 — F(x)}}. (45) 


If m?/6n? is sufficiently small, we may approximate the factorial in virtue of (18). So the 
following approximation is obtained for (15) 


—m? 
M, mla) = Gamay exp | lf —F(x)}"texp {—(n—m+1)[1-F(a)]}. (46) 


Finally if m*/2n is sufficiently small we may write 
M, ml) = Gey E Fla) exp {—n{— Fi) (47) 


Then the modal value u(or , „) satisfies the equation—compare (20)— 


(4). 3 Tra +n=0. (47a) 


2-1-1. The interval of application 

In connexion with the neglect of the function Hr, n —m + I) in (43), about the behaviour 
of this function two theorems are formulated. 

To facilitate the practical meaning of these theorems, in Fig. 1 graphs are reproduced 
showing the relation between f(x) and 1— F(x) in case p = n—m+1 = 6,8, ..., 50 (see the 
formula (48)). 

From (15) and (48) it follows that 


Ín[1—f(z, n —m II- (n—m4 1) (1— F(z)) + . (48) 
In virtue of In Faint”, » = 15, we deduce 
1— F(z))? 
na) = AB a (nm 0 - F0) A (iri Te) , (o 
3(1— F(a)? p< SFO) 


5(1 + F(x)? - 3&0 — Fle) 20 F(z) 
THEOREM 2. 1· 1 · I. (a) Avaluex, % = n—m- 1) exists for which f(x, p) = 0. The value of 
F(x, ,) may be deduced from the equation 
(n—m+1)(1 — F(x)) +(n—m) In F(x) = 0, 
by means of the well-known expansion in Lagrange series. Then follows 


9^ ‘gud 28 | 1 50 
1— Fl») = n 3m mp 9 n (n—m)*’ y^s 


(b) The function /(z, p) is negative for Xop « 2 < b, but for all values of z < z,, p, A(x, p) is 
positive and a steadily increasing function as 4 decreases. 
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0-1 02 0:3 04 
Fig. 1.. The relation between f(x) and 1 — F(x) for small values of p. 


(c) For a value v, defined by 


1—F(v,) = Nl (b<v, < Zop) 
(x, p) has a minimum value. 
From (48) follows 
1 1 0 


(0«6 « 1). 


fie == 2n—2m41 6(2n—2m+1)? (2n—2m41) 
(d) Finally, let x be given and let 1 — increase. Then f(z, p) increases likewise, 
Bley) < fx, p) « 1. 
THEOREM 2-1-1-2. (a) Let pi = n, —m, — be defined by /(v,,) = —6;6» Ois the m 
value of € which may be neglected with respect to unity. Let furthermore p»7i 
2, y Sey be determined by f(x, ) = € or by the approximate expression 


1—F(z, 5) - (p =n—m+1). 


Then we have | f(x, p) | St, if a, Sb. Consequently for this interval the approxi! 
formulae (44) and (47) hold good. In virtue of (53) for large values of n and small 
lower boundary value of Mn) is approximately (see (47)) 
M, (rag) EEE [1+ (1 + 2n) et exp [— 1.— (122) 
(b) Let p < p,, so that | A(v,)| >E. 
Then | A(x, p) | «€ for two intervals (2, p *) and (2, p» 0), whereas 


Voip 


and Bl%q,p) =6, Bs) = fi, v) = . 


«m, p X Xs p <Up< Xn € b, 
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Proof (a). According to theorem 21-I-I(e}, Air, p) bas à minimum for a = n, Hence 
Bur p)| <ë for p> p, and z, , « x « b. Moreover, z, p « 4, p in virtue of theorem 2-11:1(8) 
The approximate formula (53) may be determined from (49) by amuming that s i» 
sufficiently near b, so that approximately 
2in (1-2) = (1 Zi, JJ] ($1 — Pex, 

The validity of the corresponding statement in (6) is also obvious in virtue of Nen >ë 
for p*m Then the interval (2, p È) is separated into three intervals Un, o Fan) Un nal 
and (z, p.b) whereas | A(x, p) | <ë for the first and the last interval, but | Air, p) | > € for the 
interval (x, y, z, ,) (see Fig. I. case n = 8). 


2-1-2. The interval contains the modal value if m*/n ia amall 

The maximum value of M, (x) may be calculated by (44) or (47), if the modal value 
u, ., is situated in the interval (z, , b). Therefore the required condition follows from 
1— F(u, ,)&1 —F(z,,). 

We shall suppose that Sen 3, o that 1- Fia, ia determined by (89). 

Further 1 — F(u) (we omit the index v, „) follows from (25). 

If y(u) is small m «&n and | k | &n(1-4- E), we may write 


1 k — Ld. rh 
1-709) = 2 (n- 153) xs -h 


or L- Fw) <+ +n- maj. 
The condition, t,, m> z, p, has been satisfied if 
m- I- e li- a-. (im y(u) = = 0). (55) 


In general we may conclude from (55), that it is necessary for m*/n or m*/(2n) to be sufficiently 
small. 


For the initial distribution of the extreme value (m = 1), it follows from (25) if k = 0 
1-F(u)x 1 — (9). 
"Therefore from (51) and lim y(u) = 0, it follows that the difference between u and t, 
* 


decreases with increasing m. 
In the case of the normal distribution (see 36) 


Floh- Fler) E vie) 13 LT 


so that the maximum value of M. be Vy (M) oven o ual etum o, 
e.g. n = 25, or smaller. ; 

As another example we mention the distribution f(z) = 2~*. Then 1 — F(u) = 2/n, so that 
in virtue of (50) ue, while e) = 0. 
2-2. The second approximate formula for M, ml®) 


In this section the method for deriving approximate formulae is different from the one of 


the next section, which is based on the development of the function 1— F(x) of § 0-5. 
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The starting-point for the derivation of these formulae is (47), supposed that IU 
sufficiently small. This formula holds good forz, «z«b (see $2-1-1), while this interval 
contains w,, m (see § 2-1-2). We may write 


dk m-l 
M(x) _ f(z) | + Ee s exp {—n[F(u) — F()]) 


M(u) f(u) 
and further 
Fiu)- Flera _ Lv/ +o) 
17 = exp[(m—1)inj +0) 
0 1 2p+1 
= exp [2m- Dx tl (571 | (m>1) 
where we have put EN EG]. D rp 8 


2-F(w-F(z) I' "^ 3n—F(w) 
It can be shown that | v/(v-- 1) | «lif æ b. 
Hence in virtue of (47a) and after some computation we may write 


MG) fe)... [(f 2 5 
Mu) aoe ( ro F(x)] — 20 — JII +2(m— DE 2571177 Mu 
For b >£ > We often consider an approximation taking some terms of the series. 

Now we derive a further approximate formula for (58). Then we put f(x) = e"? an 
introduce the following series of Taylor, which holds good for | z -u| <é 


h(x) — h(u) = M (u) ( 1 + $h"(u) (zr —u)? +... 


ems (ay O (x—wuy (h’=f'lf). 
Further, we put 
r—3) — 
F(x) = F(u)+f(u)(e—u) +... v ai 1 = uc T 5 177 


(u< <s or * , V <u). 
After substitution in (58) we find after some computation 


M(x) (- (= (x — u)? 
ario |e ay te spo. er. 


Then it appears that 
PS xs 
aG - 6) acr]. 


a- [0 (ete rae]. 2 


We may expect that 45 « 0, for in virtue of (1) 


GH Ce- -(5)'a-k-e) timet . 
Further in virtue of (13) E 


arp vx): dimy . 
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Therefore for sufficiently large z 
a,x —(f'/f (1 +k) mE b) E] <0, (m> 1). 
In case of the initial distribution of the extreme value (m = 1) the coefficient a, equals 


1 
I. : 
Example, According to (45), for the normal distribution we may write if z » z, , 
M,(z) = (2n)-4nfexp qe -nfi -ten fierar). (4) 
In virtue of (38a), (61) and (63) it follows that 

Mx). " a Si- (e—eP 65 

acto = exp[—dutaye—w +0 X4 - A- 0, (65) 

Then H,(u) are the well-known Tchebycheff-Hermite polynomials (see Kendall (1947), 
p. 145) H,(u) - - 1: Ha) = - u; H,(u) = u. Gu +3; ete. (06) 


Practically, the formula (65) has been applied for the cases n = 3°, = 3, , 8, considering 
the terms with H, and H, of the series. Further, we have applied the formula (e) of § 1-5 for 
the computation of M, (u). The results of these computations are shown in Fig. 2 in compari- 
son with the graphs computed from the exact formula (15a). Then it appears that the 
differences between the approximated graphs and the exact ones are sufficiently small. Only 
in the tails are the deviations more important. 

2-3-0, Approximate formulae for M, (x) based on the formulae for 1 — F(x) of $0-5. 

The starting-point for the deduction of this formula is again the formula (47) which holds 
good for the interval (% ,, b) (see $ 2-1-1). Further for 1— F(x) the formula (12) is applied, 
supposing that the initial distribution f(x) satisfies the various conditions required for the 
deduction of this formula. 

We suppose that the series Xo. converges or will be an asymptotic expansion. In this 

a= 


last case if x is fixed, the terms decrease for n = 1, ..., p. But later on (n = p+ 1, ...) they 
increase rapidly and without limit. This fact can be applied for approximation of 1 - F(x), 
but it is not possible to demand any degree of accuracy whatever. The degree of accuracy 
cannot be improved below the value of the last considered terms of the series. 

In this case may be written (see (12) and (47)), if y, (z) is sufficiently small 


ET Jeu (a) er | + Het e 


M. ml) = (IFE) m-i) 1-(-b» 
Merc 


In practical applications the case p = 2 is important. Then see (1) 
à) = - (2) + Gipi) = 0). 


G-t 


Let | da(x) SEI x(x) | or (see (10)) 6 


q 
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Fig. 2. Exact and approximated graph for the normal extremes. 
z 
— 5 M(z)=nf (x) F(x), F(x)= f f(x) dx. 
Approximate formula: Ti 4 
M(x) _ utl u^. i T M 3. 3u vu) 
Us =| - rene i 


18 
M(u)=u exp [- esa] 


Let further /r) =e(1—k)? if r e and l. 
Then it follows from (67) 


nr) = ER E t) Gr" | - 


s hte -O H- @ 
2:3-1. Example 


Again the normal distribution is taken into consideration. Then from (68), with k = 0 
follows ] 


zx 


M, r) = (ar) Any — i (i-a) exp|-m2—(eny4tn (2-3) exp 2 


In virtue of (38a) we may further write if £p, « x « oo (see 2-1, 2) 


en- mp rt- (ieu) 0 
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The corresponding formula for the initial distribution of the extreme value itself is 
u - ju u 
M2) = wes [7,7 - (1-5) e *;* (10) 


so that JM (uo) wexp| - 145 " 


074 


0-6 | 


25 3 35 4 45 
normal extremes. 


— —, Exact graph (sée also Fig. 2). 
Approximate formulae: 
----,M(x)=u exp 2 


1 15 2 
Fig. 3. Exact and approximated graphs for the 


w—a u l 1 u-i], 
xen] 


+++, Me) = ep 
2T „ Mr) = ue m, 


). By the numerical application it appears that 
(c) may be applied if n > 3? and a deviation smaller than 2 0% will be obtained. In Fig. 3 the 
graphs of M. (x) are shown in case n = 3° (s = 4, ...,7), computed according to (70). In this 
figure the exact graphs of M, (z) computed from (154) are also shown. Then it follows that 


In § 1-5, this formula has been marked by (e 
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with increasing n the deviations between the approximated graphs and the exact 
become smaller. s 


Especially for smaller values of n the formula 


u?— x? u 1 3 2 a2 z 
Mz) = wexp| 2 Sra gu) exp s | m) 


* 
gives a more favourable approximation, in virtue of the fact that in this case M, (u) becomes 
identical with the formula (e) of § 1-5. However, for these small values of n < 34 (65) gives 
a still better approximation (see Fig. 2). 

For the larger values of » the differences between (70) and (71) become continually 
smaller. | 
2-4. Limiting formulae for M, (2) 

2:4:1. Preliminary 

We deduce approximate formulae for FF), (f), and F(x) F(u) 


In virtue of (//) = k— ige), lim $3(z) = 0, it follows by integration that 
E 


Similarly, we derive 


| 

| 

| 

fe _ REIS d : 

Fay ^ eP 16. rame uvm | 

or e - TU [ooa]. z= 1+ (^) e-v, " 
if Bw) EROR M | 
fa) Pos] | 

| 


j P laa 4 
ense eue m 
Furthermore, in virtue of (72) i 
F(z)- F(u) = f(u) NES ILE 
= chier nen 


U < v(x) < a. 
In virtue of (24a) it may be deduced that 


y(z)= —1+exp [ 15 "Awas. 


Tt has also been assumed that the product of small terms of order y(x) y(u) may be neglected 
with respect to y(x) and y(u) themselves, and that * <n(1 +k). 
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24:2. The limiting formula for M, m(x) in case f(x) is of Cauchy type; the formula of Fisher- 
Tippett 

As to the fact, that in practical applications (Y. decreases more slowly than the funo- 

tion f(x) itself, when —1<k<0 and c «z «co, we write for (48) (the index n, m has been 


omitted) 


10 = org) C. -r, ia 
or in virtue of (13) 
— V i 
Ma- -el hf. #5] u-, (16) 
erra z 0). 
From (76) it follows that T 
Miz) (J (f£) [14 EENT erra 
1% 7 GH. C. H. LF M 2 
_ 1, Y- VQ) 

g(x) = 1+ "EXE 


if y(x) and y(u) are sufficiently small. After substituting the expressions mentioned in (25), 
(72) and (74) in (77), we deduce after some computation, neglecting the producta of small 


terms M(z) (1+ k)m-k 
M) (2) ost exp [CEEE nn (9) 
z= 144 (4) (-, 


ala) = (res EO) ema cemere, (10) 


k)-k -1 
pa) = y) Tel, o) = POST y rua. 


It has been supposed that mn, |k| &n(1 +k). 

For v > wit appears thatz> o in virtue of k Oand (f"/f), « 0. If, however, z, , < z < z, < U, 
z might be negative. This case must be excluded, by choosing the lower boundary of the 
interval | x— |, for which (78) holds good, larger than z,. We remark that Jim; f'if=0. 


From (78) and (79) follows: 


THEOREM 2-4:2-1: Let f(x) belong to the Cauchy type and satisfy the conditions of $0:3. 
Then an interval (25,25), Zp,q<%1 <U <a < 0, exista for which we may put 


M(x) | mdcrn-tik |- (k-+1)m—K yerk 2. n] : (80) 
p ] " 73 k+l 


z= 1+6) (w—u)> 0, —1«k«0, -we«-—y- <0. 
u 


—nf(u)|[m(1 4- k) — k] (see (21). The formula 


Eventually (f’ be approximated by 
ventually (AN aie PR calculation and application of (34) we find 


for M (u) has been mentioned in (34). After some 
for the distribution of the excess 


e ease m 
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If we put k = — 1/7, 0» 1, (80) transforms into a more usual form | 


po — go vexp[( -m-« 5) (. D), z= 1-1 (E) e-u) (80a) 


Let (J ) - and therefore | ¢,(x)| and | y(x) | decrease to zero by inereasi 
x (c «2 « oo). Then in virtue of the definitions of /(x) and y(x) (see (72), (74) and (79), it 
follows that at the boundaries x, and v; of the interval for which (80) holds good, M(x)/ Mu) 
decreases if n increases. 

The formula (80) is a limiting expression for M, s (x) if n > oo. For m = 1, (80a) corre 
sponds with the well-known limiting form of Fisher-Tippett. 

Remark: If f(x) = av, l> 1, a> 0, c, <x «oo, (80a) is an exact formula, then z = xju. 
2:4:3. The limiting formula of M, (x) if f(x) has a finite range 

Tn an analogous way we may derive a corresponding formula for M(x)/M(u), if f(x) has 
a finite upper boundary and 0<k<oo. In this case we may write (see § 2-41) 


1 (8) 
* kb —y)— Í 9200 dt 


For the deduction of the corresponding limiting formula of M (x)| M (u) we also may follow 
the method of 2-4-2. Then (80) may be derived again, but we have to put for z 


. (83) 
b—u 
E.g. (72) transforms into 


2 = (P=2)""exp | [ow 4j. By = 


b—a 


fso dt 
i92 [re-2- [suo a| 


In virtue of the approximations requisite for the deduction of (80), this formula may be 
applied for a sufficiently small interval | u—z |; if f(x) = (b —2)!/*, (80) is an exact formula. 


Further on we may remark that for k > 0, lim (4) = —0. 
u 


uo 


2:44. The limiting formula for M, |, (x) if f(x) is of exponential type; the formula of Gumbel 
Now the distributions are considered for which k = 0 and the range is infinite (b = o). 
Therefore we take the distributions f(x), which belong to the exponential type. 
The formula (72) also holds good for k = 0. Then z(z) = 1 and 


Analogously to (73) and (74) we may write 


A 77 LY laua | 
Oo pA] -G, e 
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Fæ- Feo = fw) ['dtexp|(F) [i-e fiosa) 


T z|- 1 +exp|(4) wl] [ += wi ey) (85) 


yix) = —1+exp C. [a] (x<p(z)<u or 2x>p(x)>u). 


After substituting the expressions (84) and (85) in (77), provided k = 0, the following 
theorem may be formulated: 


THEOREM 2-444-1. Let f(x) belong to the exponential type and satisfy the conditions 


of § 10-3. 
For x and  » z, ,, we may write (the index n, m has been omitted) 
10 = pa)exp[ - mt g- (86) 
p(x) = [1+ 9,(x) +(x) yl) [1 gle) exp [q(2) (1 — 67]. (87) 


=- (5) (c—u); gr) - u) T vir): qux) = (m— 1) Wu) + my, (2). 
Again we have neglected the term y(u) v. 
If 1 < u(x) < 1 +e for sufficiently large x and | u— z | sufficiently small, M(x)/M (u) may be 
approximated by 
M(x) 


Ae) = expl -m-mie*- 1), t= -o. (88) 


According to (21), approximately we may replace (f’/f), by —nf(u)/m. In virtue of 


gy WDR 
1000 - (b) ume E 0 


f m" 1 
Mie) = - (5) peperit- met m 


This is the well-known limiting formula for M, ,,(z), deduced by Gumbel (1935). 
In virtue of the supposition that | (2) | = | C) | and therefore also | y(x) |, decreases 
to zero in the interval c < æ < for increasing v, at the bounds of the interval (a, x) for which 


M, „(£) may be approximated by (89), My, m(%)/Mn,m(™) decreases if n increases. In this 
sense (89) may be called a limiting expression for Mund). 
From (89) it follows for the distribution of the excess 
m-l mJ 
E me" 90 
E a fe (90) 
deer, d, c» 0, (89) is an exact formula, but for other initial 


Remarks: 1. In case f(x) = 
1 s 2 id H a d 
distributions thi i roximation. Then often it is necessary to take n an 
istributions this formula is an app E rueda frs cheer toto 


thus w very large, in order to obtain a favourab 
interval | z—« |. 
As an example we mention 
normal distribution. It converges very 
30 


the distribution of the extreme values (m = 1) taken from a 
slowly toward the asymptotic distribution as sample 


Biom. 45 
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size increases. This also appears from the computed cases of Fig. 3. Even whe: 

deviations are too large in comparison with the other approximate formulae, 
This also appears from an examination of the function u) (see (87)). Fort E 

normal distribution and m = 1 it follows from (38a) (84) and (85), that $3) = 


"dt ele 

ITI) = Se ae . 

Applying the well-known asymptotic expansion (14), it appears after some Y 

which at this place we leave out of consideration, that only for very large vali 

a sufficiently large interval | z— u | exist, for which y, (x) and 9,(z) are sufficiently sn 

the values of e) and y(u) may be neglected with respect to yı(x) and d). 

convergence of the asymptotie distribution (89) can be improved by an adequate 
the parameters (see de Finetti (1932) and Gumbel (1954)). 

2. We may deduce (89) from (80) by determining the limit for  — 0. 


3. Concerning the case when k= 0 and the upper bound b is finite, we may be 
cording to the treatment in § 2-4-3 and § 2-4-4, the formula (89) may also be appli 
case; provided for | z—u| «à the deviation between u(x) and unity is suffie 

(see (87)). E.g. if f(z) = e-1-2. 0 — x <b and wis sufficiently large we may 


mm 1 m —u) r-u 
ee e p E|- Boa eP e 
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TRE SAMPLING VARIANCE OF CORRELATION COEFFICIENTS 
UNDER ASSUMPTIONS OF FIXED AND MIXED VARIATES* 


By JOHN W. HOOPER 


1. IxrnopverioN 


This paer is concerned with the asymptotic variances of canonical correlation coefficienta 
under alternative assumptions about the stochastic nature of the variables, The resulta 
obtained also apply to the cases of zero-order and multiple correlation coefficients, since 
they are special cases of canonical correlations. The model and the assumptions are presented 
in §2, the results in §3, and some interpretations of the results in $4. In the Appendix 
detailed proofs of the results are given. 


2. THE MODEL AND ASSUMPTIONS 


We assume that there are M variables y, „ and A variables 1. . . , where T vectors 
of observations on these variables are available: 


(Vu--- Manus mA) (t= 1. ., T). (21) 


We shall consider the canonical correlations between the y's on the one hand and the z's 
on the other. The assumption which underlies classical correlation theory is the following: 
(i) The T vectors (2-1) are independent random drawings from a (M +A)-dimensional 
normal parent with zero means. 
An alternative approach can be based on a system of stochastic linear relations between 


each of the 's and all of the s: 
A 
Yu = X max, (u= 1,..., M; t = pe (2-2) 
A= 


where , is a disturbance and m, a coefficient which is independent of t. The system (2:3), 
for / = 1, ..., M, can be regarded as the reduced form of an econometric equation system, 
where the y’s are the jointly dependent and the z's the predetermined variables, Consider 
then: 
ii i . The T disturbance vectors 

(i) The AT values xy, are all non-stochastic real numbers. s 
{Uu --. usn} are independent random drawings from a x coerceri a anten 
means, and the y,, (for p = 1, , M and t= 1, ..., T) are determined by (2-2), m, being 
parameters independent of t. V : i 

This is the case in which the z-variates are ‘fixed variates . We proceed to ip a "h 
general situàtion, viz., that of *mixed variates', which covers assumptions (i) and ( 


Special cases. Assume that MEET (2-3) 


, En Theil i ful suggestions 
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where w,, is stochastic but £,, non-stochastic and the w’s are independent. More 
we assume: à 

(iii) For A = 1, ., A and t = 1, ., T, % is given by (2-3), where S, is a non-stochastie 
real number and wy, stochastic in such a way that the T vectors {wy ... wyi} are independent 
random drawings from a A-dimensional normal parent with zero means. These vectors are 
independent of T disturbance vectors {uy ... Uyn}, which are themselves independent random | 
drawings from a M-dimensional normal parent with zero means. The y,, are given by (22), 
the mi being parameters independent of t. | 

It is easily seen that (iii) contains (i) and (ii) as limiting cases: if w,,=0 for all pairs (Af), 
then the a- variates are fixed, so we are in case (ii); if £;, = 0 for all pairs (A, t), then thez' 
and y’s, for each t, are subject to a joint multinormal distribution, as in case (i). 


3. THE ASYMPTOTIC SAMPLING VARIANCES 


In order to derive the variances and covariances of the canonical correlations under the 
assumption of mixed variates we write | 


T T T 
ra mou Qut = 87% I = Crys (31) 


We then have under the assumptions of canonical correlation theory 


Dowhyhy = Ae, , —1; Deayhak, =r, (33) 
AA D in 


order to transform the s and the y’s, respectively, to canonical variates, and r is a canonical 
correlation.* Taking differentials, we find 


2 L ay ha dh, 4 Y, hyhydoyy = 0, 
XX heu 
2E s, , k, dk, +d Ude, = 0, (33) 
Hh, u^ ps n 
X c, fa dk, +d ej, k, dh +d Hd = dr. 
An An Aye 
We may now assume without loss of generality that all variables are in canonical form which 


means that all h’s and k’s vanish except for one pair, h, and k, say, which are both eq 

to 1. Then (taking account that &,, = à; = 1, ¢ = ri) we can simplify as follows: 
Qdh,+do,=0; 2dk,+ds,=0; rdk, -r dh, H dei = dry 94 

where v stands for cn s, for s,,, and c, for cn. From (3-4) we find 

dr, = de, — yr, (do, doi) o5 


where h, (A = 1, , A) and k, (# = 1, ..., M) are the coefficients that are to be j 
and likewise for any other canonical correlation, say rs, 


(29) 
and Kendil 


dry = dcs — $ra(ds, + do). 


* For a more detailed account of canonical correlation theory see Hotelling (1936) 
(1955, pp. 348-58). 3 pulation: 
+ In order to do this we disregard, as Hotelling did, any multiple or zero roots in the po 
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Squaring (3:5) and taking expected values (omitting the subscripts) we obtain to the order 
of T- 
varr = varc + }p*{vars + 2cov (s, 7) + vare] — p(cov (e, 4) + cov (e, v)], (3:1) 


where p is the parent canonical correlation and the other variables are defined as* 
E T P 
M: a e se PLI 
T (3:8) 
o= X (E+ wy. 
We notice that c contains a non-stochastic variable £ and a stochastic variable w. We can 
define their respective variances as (E being the expected value operator) 
p-E(Xd; 1-p = E(Xwj) (3:9) 
since the expected sum of squares is 1. An evaluation of the terms in (3-7) then gives, 


varc = rt +p\1—2p*)}; vars = 70 , 
var o 50-2 cov (5,7) = pen T Sa eu 


cov e. ) gere, cov (e,o) = gp - p). 


When these results are substituted in (3-7) we obtain:T 
THEOREM. Under assumption (iii) the asymptotic sampling variance of a sample canonical 


correlation is 1 
varr = 27 ((1- p*)* ph. (311) 


Under the same assumption the asymptotic sampling covariance of any pair of canonical 
correlations is cov (ri; rg) = 0. (3:12) 


The second part of the theorem is proved by multiplying (3:5) and (3-6) and taking 
expected values. We obtain: 


COV (74, rg) = COV (C4 C2) — &ps[eov (Cy, S2 + cov (Cy, 03)]— palcov (cs, 81) + COV (Ca, 01)] 
+ Ep, pal Cov (8,82) + cov (51, 03) + COV (F1, 8) + COV (0,,03)]. (3:13) 


When this expression is evaluated we obtain (3-12) to the order of T. 


4. INTERPRETATION OF THE RESULTS 


In addition to the general results given in (3:11) and (3-12) there are several special 7775 
to consider. When the number of dependent variables is one (M = 1) and when the number 
of independent variables is also one (A = 1) the canonical correlation becomes the zero- 
* Since all variables are in canonical form the y, and z, in (3:8) are linear combinations of the y's 
and z's defined in (2-1) and (2-3). : 
+ The proof of this theorem is given in the Appendix. 
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order correlation. When A>M = 1 the canonical correlation is the multiple ec 
coefficient. Considering these two cases for p = 0 we find,* 


varr = 70 Bons var R? = 7150 — R3, 


where R? is the squared multiple correlation coefficient and R? is the corresponding para- 
meter in the population. These are the usual asymptotic results for correlation theory and 
correspond to the case under assumption (i). f When p = 1 we find 


varr = p (1-0) (2-); 
: (42) 
var R? = pel — R2)? (2 — R?). 


These results correspond to the case of assumption (ii) where the independent variab 
are non-stochastic.f The variances in (4-2) will always be less than the correspo 
variances in (4-1) except in the limiting cases of p = = 0 and p = R = 1. Intuitively thi 
seems reasonable since that part of the sampling variation due to the sampling variation 
in the independent variables is eliminated when they are non-stochastic. 
6 The results for the zero- order and multiple correlation coefficients for 0 « p < 1 are given 

M 


1 2 = - 
ER T1 dco. ios I — R2)2 (2 — R23). 
varr = 55 (1 52% (2—p%p?); var R? pee 2) 2 (2— R2p?) 


From (4-3) it is seen that the sampling variances are continuous functions of p, p, and. T. 
In Tables 1 and 2 various values for the variances of and R? are given for a constant samp! 


Table 1. Values of T varr = 4(1—p*)?(2—p%p2) for various values of p? and p? 


| | | E 
0 0-1 0-2 0:3 0-4 0-5 OG che” OT 0:8 


| 
| 
1-0000 | 0:81000 | 0-64000 | 0-49000 | 0-36000 0˙25000 | 0-16000 | 0.09000 | 0-04000 
| I 
| 


1:0000 | 0:80595 | 0-63360 | 0-48265 | 0-35280 | 0-24375 0.15520 | 0-08685 | 0-03840 
1:0000 | 80190 62720 -47530 | -34560 | -23750 | -15040 | :08370 | 03680 
1:0000 | -79785 | 62080 -46795 | 33840 -23125 | -14560 | -08055 | 03520 
1.0000 79380 -61440 | 46060 33120 .22500 | -14080 | -07740 | 03360 
10000 | 78975 „60800 45325 -32400 | 21875 -13600 07425 03200 
1:0000 0.78570 0.60160 0-44590 031680 0.21280 0.13120 0-07110 | 0-03040 
1:0000 | 78166 59520 -43855 | -30960 | -20625 12640 -06795 | 02880 
1-0000 | 77760 -58880 | 43120 -30240 | 20000 12160 06480 -02720 
1:0000 | -77355 | 58240 42388 29520 -19375 | -11680 | -06165 | -02560 
1.0000 | -76950 | -57600 | -41650 | 28800 -18750 11200 -05850 | :02400 


* We here make use of the fact—for var R?—that var r= 4p? var r. 
T Cf. Kendall (1952, pp. 336, 385). 
1 Cf. Hooper (1958). 
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Table 2. Values of T var N = 2f8(1— fj: (2 — R*p*) for various values of R* and pë 


0 0-32400 031200 


0-58800 | 0-57600 | 0-50000 | 0-38400 | 0-25200 dona Eia 


0 0.32238 | 0-500688 | 057918 | 0-506448 | 0-487650 | 0:37248 | 0-24318 | 0-12288 | 003438 
0 :32076 | 50176 -57036 | -55296 | -47500 | -36096 | :234360 | 11776 03270 
49664 -56154 | :54144 | -46250 | -34944 | :22554 | -11264 | 03114 


0 :31914 | 
0 31752 -49152 | -55272 | -52992 | -45000 | 33792 -21672 | -10752 | -02952 
0 31590 48640 -54390 | 51840 :43750 | -32640 | 20790 10240 02790 


0 0.31428 | 0-48128 | 0-53508 | 0-50688 | 0-42500 | 0-31488 | 0-19908 | 009728 | 0-02628 
0 31266 47616 -52626 | -49536 41250 | 30330 -19026 | 09216 02466 
0 -31104 47104 -51744 48384 40000 29184 -18144 | 08704 02304 
0 -30942 | -46592 | -50862 | 47232 -38750 | 28032 17262 -08192 | -02142 
0 -30780 | -46080 | -49980 | -46080 | :37500 | -26880 | -16380 | 07680 -01980 


| 


size 7’. It may be noted that the relative change in var r as p varies for a constant p^ and T, 
is an increasing function of H. Thus for 5 0-9 the relative change for 0 € p < 1 is greater 
than 50 °,. This indicates that the correct specification as to whether the model conforms to 
assumption (i) or (ii) may be important in applications where such large values of p* are 
obtained. 

Finally, it appears that the asymptotic variance of the zero-order (M = A = 1) regression 
coefficient is independent of p.* This agrees with the well-known fact that the variance 
is identical for the two limiting cases of p = 0 and p = 1. Thus, we see that while the choice 
of models corresponding to assumptions (i), (ii), or (iii) makes no difference to the asymptotic 
variance of the zero-order regression coefficient it may lead to a considerable difference in 
the variance of the zero-order correlation coefficient. 


REFERENCES 
Hoorzn, J. W. (1958). Simultaneous equations and canonical correlation theory. Report 5806 of the 


Econometric Institute of the Netherlands School of Economics. 1 E 
Hore, H. (1936). Relations between two sets of variates. Biometrika, shen Wi o BUTT 
KENDALL, M. G. (1952). The Advanced Theory of Statistics, 1, 5th ed. London: mM Pera ten 8 
KENDALL, M. G. (1955). The Advanced Theory of Statistics, 2, 3rd ed. London: Char! riffin s 


* Cf. Appendix below for the proof of this result. 
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APPENDIX » 
We first prove the results given in (3:10). We write for our model in canonical form 


= ptm) + ti. (4.1) 
Substituting (A. 1) in (3:8) we obtain 


e Al. w) (ph pwt u)]; 
s = DPE kw) eub + 20E +w) ()]; "t 
v= EU + wi + 26,00) 


Taking expected values and remembering that all terms containing w, or u linearly vanish since 


nM (uur = E(w,) = E(u) = 0, 
we 


Elo) = pb G-): Els) = p"E&-(1—p*p); ` Elo) = E& (I-. (4.3) 
Subtracting the expressions in (A. 3) from the corresponding ones in (A. 2) we find 


dc = 2pY£w,-- LIS, ur + pwi +w- pll- p); | 


ds = 2Y[p*Ew, + prtu + pwu] + Apa + uj] —(1—p*p); (4.4) 


do = 2X£w,4- Xw; — (1 — p). 
Squaring and taking the expected value of the first expression in (A. 4) we obtain 
varo = H[4p>(SE,w,)? (XE u, + pEwj + Ewu)? pal) ' 
+ E[4pX£w, X(E uf put + w,uj) — Ap*(1 — p) E&w,]— E[2p(1 — p) NEU + pwt . (4-5) 
If we now omit all zero terms (a term is zero when it contains w, or w linearly) we find 


var o = 4% D Ebene) + 2B Euibeur) 
+p? X Bw! wb) + X (uuu up) — 2p%1 —p) EE(ut) . HUI, (8.9) 
tt i,t’ 


The evaluation of all terms except the third is straightforward. For the third term we have 


pr Badu) = px Bett) + p* E Bwt) = gu E ü- E TUNE 2 az pi, 


making use of the ae that the fourth moment of a normal variate is three times its squared vati- 
ance. So we obtain * 


varo = T ptp(1— p) 2 750 py Tete 


(1p)? 


+7 (1=P) (1 p)— 2941 - prp -p li, (f 


Squaring the second expression in (A. 4) and taking expected values (omitting zero terms) we find 
vars = E(4p*(p*( E wj)* + (E£ uj)? Nu) + 2% N Xu]) 
2 2 
EU [pd wi dui] — 201 — ptp) AU + u1]) + (1 —ptp) = (PP un 
Squaring the third expression in (A. 4) and taking expected values (omitting zero terms) we have 
var q = E(4(E£w,)* + (Ewi) -2(1 ) Ew? + (1 —p)? 


2 A.9) 
250-2»). 
* Henceforth, the order of summation will be over t z b ., T unless otherwise specified. i2 E 


t A term may also vanish because it contains wt or u? which lead to the third moments o 
variates, 
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Multiplying the first and second expressions of (A. 4) and taking expected values (omitting zero terms) 


we obtain 
cov (es) = EPEE)" + 20( £u) NR )* + 26 Xue, u,)* Nu Eu] 
: — Eip(1 — p*p) Euj — p(1— p) Eu] — p(1 — p) Ef] + (1 — p) (1 — p*p) 
%. (A. 10) 


Multiplying the first and third expressions of (A. 4) and taking expected values (omitting zero terms) 
re obtai 
chip cov (c, ) = E[4p( EE vw)! + (Eu) — 29(1 — p) Lud + p(1 — p)*] 


610 (A. 1) 


Multiplying the second and third expressions of (A. 4) and taking expected values (omitting zero terms) 


we find 
cov (8,0) = E[4p( ££ wj + p(Euj)* + Lu Eu; ] 
— E(p*(1 — p) Sw} — (1 — p) uf — (1 — p*p) uj] + (1 —p*p) (1 — p) 
- 750 —p*). (A. 12) 


This proves the results given in (3-10) and hence the first part of the theorem. 

In order to derive the covariance between any two canonical correlations, say r, and T, we can 
regard the expressions in (A. 4) as applying to r, by attaching the subscript 1 to the variables £, tip Wn 
p, and p. Similarly, we have for ra: 

deg = 2p, £y wy + DE tine . atu] — pall — pi) 
ds, = 2 (p Eg pa + Pan tae) + EL pa wis + ur] — (0 pa py): (A. 13) 
do, = 2X£, wa Ewi — (1 —p,). 
Multiplying the three expressions in (A. 4) by each of the three expressions in (A. 13) and taking expected 
values we obtain the nine covariance terms of (3:13). Applying this procedure to de, and de, we have 
(omitting zero terms) 
COV (c1, ea) = EL py pa Xu, Dw, — pi p21 — pı) Lui, — pip —pi) Sw) + pspi(1 — p) (1.—p3) = 0. (A. 14) 


In a similar manner, as the reader may easily verify, all other covariance terms in (3-13) are zero, which 
proves the second part of the theorem. ^ 3 : 
For the asymptotic sampling variance of the zero-order regression coefficient b we have 


varb — varc varg 2 cov (e, 7) t (A. 15) 

8 i Cp C UNO Elo (co) a 

Upon substituting from (A. 3), (A. 7), (A.9) and (A. 11) we obtain the familiar expression (/ being the 
population parameter) 1 ,,(1—p?) 

varb 5H pi d 


all terms containing p having vanished. This proves the remark made at the end of § 4. 
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THE MEAN DEVIATION, WITH SPECIAL REFERENCE TO § 
FROM A PEARSON TYPE III POPULATION 
Bv N. L. JOHNSON 


University College London 
1. Let dpi 
distribution 


„n be independent random variables, each following the Type TI II 


Ww 
E RET 

p(z)- re To € (x2 0) 

which has moments and moment ratios 

ó (x) = = var (x); 


fix) = da; f(x) = 34+ 6x1. 
The joint probability density function of x}, , 


Sez. 18 


1 n a~l n 
(TI, 42, ...,2,) = Tor (i 2 err (x0). 
We now make the transformation 


1 E. 
nl j 1-2) = x,- = e ee? 7 
n 
1% 1% (28.24) L 
which has the Jacobian 
Rated | n E 
CTET ee i 
We find 


p (t, Ug, ++ 


{(n — I/ e ο 1” a-l n 
vt) Tae ( , 
Hence, for w, » 0 


I [uj- Le- erga ( Eu» . (2) 
pis,)= {(n— (n— D/nye7ns.. 


j 
G “fr 95 615 E. J (i Ad 4 


Applying Dirichlet’s formula, we have 


105 _ {(n- 1)/nj— Da 


s oo t a-l NA 
FU Da)* Jk (u+) [n0 21 t, 
where L=0 if u,>0, 
L= n 
Hence if u, > 0, and provided æ is an integer 
n—1 na- Da aml / — 
—-]yo-»0a a-1(p — — 
ie.  p(u)- - et Y (n — 1) a[(n — 1) a 1].. 


[(n— 1) a r— 1] ug 
— wr! (a — 1 —rn) 


if u,<0. 
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The mean deviation of the n values z,,z, ...,z, is m = : S z,-2| and the expected 
nó 


value of m is 
S(m) = &(2,-2]) - *- equ p). 

Since (ur) = 0, 

2 


&(m) == 


abe | u, > 0) Pr{u, » 0] 


uy p(w) du, 


_ 2(n- 1) f° 
E 
r CLE 


2 
n r=0 n’.r! 


r) 


r 2-1 (((n—1)a+1)...[(n—latr] 
1 ) od A en 3 


rel 


(a 1)a+1)...[(n—l)a+r— 2l 


n'(r—1)! 
_ 9 (n—l\e det (na-) (na—a 2)... (na—1) 
3 ES 3 n? (x—1) (4) 
for x > 2, while fora = 1 n—1\* ' 
ém) = (=) i (4) 
n 
For large n, expansion in powers of n-* gives 

2 1 

3 2a9e-* 
and 3 = (=I). E (6) 


By analogy with the Normal case, and also considering the general relation 


-1 
var (v) = —— var (z,— C)), 
the alternative approximation 
em 2 eH ha ‘) (5)' 
(™)= Gai n 
is suggested. 


Table 1, below, shows some typical values for the ratio of expected values of mean 


deviation to the standard deviation (a) of the distribution. doa 
Table 2 gives some approximate values of this ratio, calculated from (5)', which is a 


considerably better approximation formula than (5). 
2. If m is to be used as the basis for an estimator of the population standard deviation, 


7, then an unbiased estimator would be m(c/& (m)). 

In the case considered in $1 the exact value of the multiplying factor depends on the 
parameter a (as well as on the sample size, n). It is, in fact, the reciprocal of the function 
shown in Table 1. For values of n greater than, say, 15, however, it appears that the likely 
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inaccuracy in æ will not be of so large an amount as to produce a really serious error in 
value used for / (m). 


Table 1. Ratio & (n) /o for samples from Type III populations 


3 0-5926 0-6208 | 0-6307 0:6359 0-6389 0-6410 | 0.6425 | 0:6436 
4 6328 6607 6706 6756 6786 6807 6821 6832 
5 6554 6833 6932 6982 7013 7033 7048 7059 
6 
8 


6698 6979 7078 7129 7160 7180 7195 7206 
6872 7165 7256 7307 7338 7359 7374 17385 


10 0-6974 0.7260 | 0-7360 0:7412 0:7443 | 0-7464 | 0-7479 | 0:7490 
12 7040 7328 7429 7481 7518 7534 7549 7560 
15 7105 7395 7497 7549 7580 7602 | 7617 7628 


20 7170 7461 7564 7616 7648 7669 7685 7696 
30 7233 7527 7630 7683 7715 7736 7752 7763 


0:7358 0-7656 | 0-7761 0:7815 0:7847 0-7869 | 0-7884 | 0:7896 


* When g =I the population is exponential, when «=o it is Normal. 


Table 2. Values of the approximate formula Wi p e for the ratio of Table l 


— — — — — le = = f = ———— 


3 | 0-6008 | 0-6251 | 0:6337 0-6381 | 0:6407 | 0-6425 | 0-6437 | 0-6447 
6 6717 6989 7085 7134 7163 7184 7197 7208 
10 6981 7263 7363 -7414 7444 7465 7480 7491 


15 | 0.7109 | 0:7396 0.7498 0-7550 0-7581 | 0:7602 | 0-7617 | 07628 
20 7172 7462 7565 7617 7648 7670 7685 7696 


30 7234 7527 7631 7684 7715 7737 7752 7763 


* When a = the population is exponential, when æ= co it is Normal. 


Turning now to the general question of estimating c from a sample value of m, 111 
useful to compare the standard deviations of m(o/4(m)) and s, the sample standard devia: 
tion. We will assume that the correct multiplier o/&(m) is used, and neglect the small b 
in s, regarded as an estimator of o. If the sample size, n, is large, then 


7 
var(s) (py 1), a 


where / is the second moment: ratio of the distribution of the observed variable 2. 
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A large sample approximation to the variance of m may be obtained as follows. 
If n is large, & A and 


4 (m) = (1/n) n& (| x— (2) |) = &(| x— (2) |), 
E (m?) & (1/n*) [nd (| M — 1) {8| 2- (2) )*] 


+ (1/n)var(z)+ (1- ‘) [£ (m)}. 


Hence var (m) = &(m*)—[&(m)]}*=(1/n) [var (x) — AA. (8) 
where &(m) now stands for the population (n = c) value of the mean deviation. 


Table 3. Comparison of exact and approximate formulae for var (s) and var (m) 


in samples from a unit Normal population 
var (a) var (m) 
n Pi 
s Approx. (8) 

Exact Approx. (7) Exact 1 i= °) | 

n m" 

5 0.09314 0.10000 0.070 | — 0-072608 

10 04854 05000 03589 03634 
20 02466 02500 -01806 01817 | 

30 -01652 01667 01206 | 01211 


To give some idea of the accuracy of these approximations, Table 3 shows (i) the exact 
values for var (m) in samples from a unit Normal population (Pearson & Hartley, 1954) and 
the values given by the approximate formula (8), and (ii) a similar comparison for the exact 
value of var (s) and the value given by approximation (7). Both approximations appear to 
be sufficiently accurate for our purposes, at any rate for Normal populations. Tt is interesting 
to note that a very good approximation to var (m) is given by the simply modified formula 


var (m)=2 (1-7) EE (9) 


Values given by this formula are also shown in Table 3. Using (7) and (8), we have 
n x (coefficient of variation of s) TVG 1), 


o? 
An x (coefficient of variation of m) = "i on, a i) : 
Hence, so far as the approximate comparison is valid, m(o/&(m)) will have a smaller 
Standard deviation than s if E 
„ 


; dot Ne 10 
ie, B> imp 3. (10) 


482 Mean deviation 
The table below gives limiting values of f, for a few commonly occurring values of 4 


é E 43 
e [é(m "a 
0-70 5-16 

75 4-11 
80 3:25 
85 2-54 


of distribution. 
It may be noted that an approximation of the form used in Table 2, namely 


Expected value of mean deviation in sample of size n JS 1 
Population value of mean deviation i n 


would enable an approximately unbiased estimator of. population standard deviation, ba 
on the sample mean deviation, to be obtained. The approximation of equation (8) might 
then be used to estimate the standard error of this estimate. A 

For example, taking the case of the Type VII distribution dealt with in the follo mng 
section and supposing we had a sample ofn = 5 from a population with æ = 4, the population 
standard deviation would be estimated by 


m x (0-759 /&)1 = 1-473m. 
The standard error of this estimate would be estimated as 
1:473 

J5 


3. Application of (10) to the Type III distribution gives results shown in Table E 


; [ 
Tables 5 and 6, respectively, give the results of similar calculations for the Type Vl 
distribution 


1:473 x 


i 
1 (vaso): = 0-623m. 


=- A Y » 2)—a 
Be) =F aap ite) , 

and for the Type II distribution 
p(z) 


1 
Bla T I, I) 0-29?" (0«z«1. : 
From the above three tables, it can be seen that m(c|& (m)) has, to the degree of approxt 
mation used, a smallerstandard deviation than s forsymmetrical Pearson curves with fa? 


and for Type III curves with 7, 3-35. These values of Ba correspond to only moderat 
leptokurtosis, 
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For finite values of n (the sample size) the criterion needs modification. A first approxi 
mation to the modified form of criterion replaces (10) by 
dg? 1 
[£(m) 73«/- = (5-3) 


where & (m) is still the population (n = oc) value of the mean deviation. A glance at Tables 
4-6 shows that this modification will not seriously affect our conclusions if the sample size 
is about fifteen or more. 


Table 4. Type IIT 


1 0-736 
2 0-766 
3 0-776 
4 0-782 
5 0-785 
6 0-787 
eT zu LAE 
ge, fis getan, dme DT 


Table 5. Type VII 


a | Sn) | X [A 4(m)jg x* L^ 
| E: | 
l3 0-637 | 6-8 oo 0-776 3-04 3-86 
| 3 | 0-735 44 9 0-783 3-53 3-55 
4 0-759 39 5 0-786 347 34 
5 | 0-770 42 0-798 3.28 3-0 


0 = (23) 1; 5. = 302 — 3) (2504; ee 


Table 6. Type II 


a | &(m)a * L^ 
0 (rectangular) 0-866 2-33 1-80 
1 0-839 2-69 214 
2 : 0-827 2-85 2.93 
3 0-820 2-95 2:45 
oo (Normal) 0-798 3-28 3-00 
LS 15728206 1002 B24 
2 i y 
o? = }(2a43)-2; . = 302 +3) (2 ＋ 54 é(m)— serias], (x(1 —2))* dat 
4g? 
F + For large samples. 
X chi 
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AN APPROXIMATION TO THE DISTRIBUTION OF NON-CEN 


By MAXINE MERRINGTON 4x» E. S. PEARSON 
University College London 


l. INTRODUCTION 

The publication by Resnikoff & Lieberman (1957) of very extensive tables of the ne 
t-distribution, reviewed elsewhere in this issue, has been a welcome achieveme 
from providing a quick solution to a number of specific problems described in the i 
tion, the Tables make possible further investigation into methods of approximati 
interesting distribution. Two considerations suggested the inquiry described in th 
paper: (a) Interpolation for the non-central parameter ó is not altogether e 
Resnikoff & Lieberman Tables and low values of à are not covered.* (b) A few 
comparisons had shown that the non-central t-distribution could be closely repres 
a Pearson Type IV curve. 
With regard to point (a), it should be stated that the authors of the Tables had p 

in mind their use in certain quality control problems, where the chosen values of & 
each value of the degrees of freedom f—were particularly appropriate. Neverthele 
be seen when the diagrams printed below are discussed, a substantial class of dist 
are not covered by the Tables. 
In the following pages we shall first explore the relationship between the mome 
Hil-), Halt) of the distribution and the parameters f and à. The field of f, (t), P, (t) will 
to be almost exactly that of a Pearson Type IV curve. We shall then examine thee 
which the tables of percentage points of standardized Pearson curves (Pearson & 
1954, Table 42) can be used to give correct values for the percentage points of non 
While we have no doubt that alternative and perhaps more accurate methods e 
found to supplement the Resnikoff & Lieberman Tables, + we think that the present 
gation illustrates the value of the suggestion recently made to us by Professor John 
namely that we should extend the table of standardized percentage points for I 
curves, both as regards the number of points tabled, the number of decimals given a 
range of Vi, Y: values included. 


2. THE DISTRIBUTION AND MOMENTS OF NON-CENTRAL Í 


Using the notation of Johnson & Welch (1939) and Resnikoff & Lieberman (1957), 
fie 270 
ENT 
where z is distributed normally about zero with unit standard deviation and w is & q 
distributed independently as VF with f degrees of freedom. The probability dens 
tribution of ¢ may be expressed in the form 
plt | J. d) = constant x ( +5) ART [- Fa Hh; (res t 
* For example, for f = 8 the lowest entry for ô is 2-03 and for f = 20 it is 3-09. 1 
T Recently Harley (1957) has shown how an approximation which is probably as accurate 


can be obtained by a transformation of the distribution of the product-moment correlation ce 
1 The ‘constant’ contains f but not ò. 


t 


* 


, * 
MAXX Maxaixorox axp E. 8. Pransox 4 
where Hin) = [°F exp leere a 
is the tabled Hh-function (Fisher, 1931). 
The Type IV curve can be written in the form 
piz | a, m, v) = A erte (aio), * 


where -< « . It would therefore be tempting to equate a to . = to Kf 1) and to 
determine empirically a value for » which would bring roughly into agreement the last two 
terms on the right-hand side of (2) and the last term in (4). This might be done by seeking 
approximate agreement between some of the lower momenta of the two distributions. Very 
little, would, however, be gained by this approach aa no tables for the Type IV distribution 
exist which can be entered with the parameters d. m and v, One point of interest doos 
however arise from the comparison. 

For the distribution of equation (4), m is a function of the moment ratios In. Ap in fact 
m = (5/, — 68, —9)|(28,—3f, — 6). Thus if we were to equate m to J(/ 4 1), we should have 
a relation 

ar yar Nae are (8) 
This implies that for constant f, the empirical Type IV distributions would have At. , points 
lying on a series of straight lines in the f,, I plane which all pass through the point Jj, = —4, 
fe = —3 and eut the J, axis at the correct points, J, = 3(f—2)/(f—4), for the central 
distribution for which ô = 0, 7, = 0. Examination of Fig. 1 shows that the contours of the 
true //, (t), Halt) points for constant f are in fact not far from straight lines, although the slope 
of the best approximating lines is not quite that of equation (5). A better approximation is 
provided by equation (10) given below. 

Before attempting to relate the parameters f and d with those of a particular . 
approximating curve, it will be useful to examine the relation of the moment ratios //,(/) 
Hit) to f and à. 

The He. four moments of t about zero are readily obtained from those of z (normal) and 


of the reciprocal of y, since 
E(t) = f¥E (z+ by x F(X), © 


where E(x) = rr C0 /r sor» (7) 
Thus 


ae ' f 
mlt) = (af) na ô, (= p 4-81). 


e TA 
iit = ai T Dagro mi = Gaga) 


(8) 
(3+687+6%). 


: ticular and 4 can be 
Central moments and moment ratios for values sym Él about zero, 
calculated by first obtaining the numerical Biom. 45 


31 
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and then converting to moments about the mean or, perhaps more, expediently fr 
the relations ( 


, Tf-5 f ( x) i z 
= EON S = lee (4), = 375 (1 +8?) — (ui), 
„3 ) 
Hs = 1 (A8 05 — 24g} > 
D ＋ (501 | 
3+ 662 + 00) (u) Et DEEG- 3% : 
eee Ne v-sg-s | 
The value of &(x/,/f) is given to six decimal places in Biometrika Tables for Statisticians, 
Table 35 for f = v = 1(1) 20(5) 50(10) 100, making the calculation of the mean straightforwa 
for low degrees of freedom. Using these equations, values of /, = 3/3 and ^, = A we 
calculated for every intersection of the f and ô contours shown in Fig. 1. 
It will be seen at once how nearly linear the f-contours are. An empirical caleulatig 
suggests that within the area of this diagram, the contours for Fare represented quite well by 
the system of straight lines 


wean CERE 
EL 290-4) 


JOE [-3:2 3(f— 2) 
f» = 1-406 poate rer] : 
in place of the system of equation (5), which was obtained by equating m of the Type I 
curve (4) to 4(f +1). d 
A second point to be noted is the way in which the ó-contours crowd up towards " 
limiting curve for à = oo. In the limit (as à — cc) % is distributed as |/f/y, which has for its 


sth moment about zero 
c -c 


Although the limiting moments of f are infinite, the beta-coefficients are finite and were 
determined from (11) for the values of. fused in the diagram, leading to the limiting contout 
marked ô = oo. This curve lies just below the curve 


Pilha +3)? = 404% — 301 (25, — 3f, — 6), 


on which lie the Yi, 2, points for the Pearson Type V curve, which forms the upper boundary 
of the Type IV area. (See the chart of Table 43, Pearson & Hartley, 1954). Thus the D 
points for the non-central distribution lie entirely within the Type IV area. It is of inte 
to note that it is the reciprocal of x, not of y, which has a Type V distribution. Thus ‘i 
limiting form of the Type IV approximation does not provide the correct distribution: 

as ô — oo, although it does at the other boundary when ô = 0 and Type IV turns i 
Type VII or Student's distribution. 


3. Tun TYPE Iv APPROXIMATION TO THE PERCENTAGE POINTS OF t 


There are no available tables of the probability integral of a Type IV distribution, but fe 
upper and four lower percentage points for any distribution with 7, < 1-0, % S 0:0 can 
found by interpolation in Table 42 of Biometrika Tables for Statisticians, 1. We have 
comparisons of upper and lower 5, 1 and 0-5 °% points for the 19 cases having the Fand 
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values specified in Table 1. True values were obtained either from Resnikoff & Lieber 
table of percentage points or from Johnson & Welch’s (1939) tables. The approximat 
values were found as follows: 

(a) The first four moments of f were calculated from equations (9). 

(b) From these, values of Hit), Halt) were obtained. 

(c) Linear interpolation in Table 42 then gave the standardized deviation for tli i 
percentage points. 

(d) The approximate percentage points for t were then calculated from the relation 


percentage point for t = ui(!) + a (t) x standardized percentage point (I 


In so far as the //,, % values for given f and 9 can be read from Fig. 1, it is only necessary ti 
calculate the mean and standard deviation of t from the first two equations of (9), a 
quick process if the mean value of / is available in Table 35 of Biometrika Tables fë 
Statisticians, 1. However, as this simplification sometimes introduces small last figu 
errors (the graph being difficult to read correctly between the contours), we carried outt 
full calculation for Ji, J in every case. | 

Two points should be noted about the accuracy of the comparison. In the first placet 
standardized deviates given in the Biometrika Table 42 are only given to two decim 
places. After interpolation and multiplication by o(¢) (for which the values are given inth 
4th column of Table 1), it is clear that an error of 1 unit in the 2nd decimal place of tl 
approximate percentage point must often occur, due solely to the limited scope of Table! 
and not to the inadequacy of the Type IV approximation. 

Secondly, Resnikoff & Lieberman's tables of percentage points of t/,/f were calet " 
(see their Introduction, pp. 27-8) by six-point inverse Lagrangian interpolation in thé 
4-decimal place tables of the probability integral. They record their percentage poii 
to three decimal places and remark that these ‘are believed to be correct in the secon 
decimal place throughout, and to differ occasionally from the true values by no more? 
one or two units in the third decimal’. It is therefore a little difficult to say what em! 
may be expected to result when these values of t/,/f are multiplied by q to give percent 
points of t. 4 

However, on comparing the results of our Type IV approximation with the tv é 
derived from Resnikoff & Lieberman’s percentage point tables or from Johnson & Wel 
tables, it was only for the large values of f of 34, 44 and 49 that we found any diffe E 
greater than 1 unit in the second decimal place. On going back to the probability E 
table of Resnikoff & Lieberman for these f values and making fresh inverse interpo? E 
we obtained rather different values for / H at the lower percentage points. When t 
adjusted values were multiplied by vf all the differences of over 0-01 disap 5 "T 

The figures in the last six columns of Table 1 contain these adjustments for / — 
and 49; otherwise they have all been derived from Johnson & Welch's or 
Lieberman's tables. BE 

Thus as far as we can tell, within the range of f and à covered by this investigation 
Type IV approximation seems to provide values of both upper and lower 5, lee „stail 
points for non-central t which are not in error by more than 0-01. It is almost i 
that in some cases the approximation could be more accurate if tables of stand 
deviates of Type IV distributions were available to three decimal places. 
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The points marked in Fig. 2 with double circles, correspond to distributions th. 
lowest ô values (for given f) tabled by Resnikoff & Lieberman. Thus the Tables do not, 


Scale of f; 
2 
eo 


0:5 

Scale of f, A 

Fig. 2. Chart showing points at which the accuracy of the Type IV approximation was tested. Th 
points surrounded with two circles correspond to lower limit à values in R. and L. Tables. 


oi 003. 703: 04 06 07 08 


situations most commonly met, it is clear that there is a wide class of distributions, with: 
central t-distribution at one boundary, which is not covered by the Resnikoff & Lie 
tables. It is in this region that the Type IV approximation is likely to be at its best. 


4. CONCLUSION 


The present investigation has shown that in the region of the upper and lower 5-0-5 90 b. 
the Pearson Type IV curve provides a very good approximation to the non-central t-di 
bution over a wide range of values for fand d. In full, this approximation involves cale 
the first four moments of t from equations (9) and then using the table of stanc 
deviates (Biometrika Tables for Statisticians, 1, Table 42). For many purposes it 


probably be adequate to calculate only it) and o(t) and read off the /,, 2; values f 
chart of Fig. 1. 
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We realize, of course, that it is more likely in practice that the probability integral of 
non-central t rather than the percentage points will be required. We therefore regard our 
investigation as in part a contribution to a wider subject: the practical utility of tables 
which enable a single system of non-normal curves to be used in approximating to the 
percentage points of other untabled distributions, through the use of four moments. 

It is planned to take steps to extend Table 42 to give 3-decimal accuracy in the standardized 
percentage points, further points in addition to the eight already tabled and a more extended 
range of f, H values. 
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Bv F. G. FOSTER 
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l. INTRODUCTION 


These tables extend the tabulation of the 80, 85, 90, 95 and 99 % points of L.(k; p,q) to the 
case k = 4. They are a continuation of the tables for the cases k = 2,3 given in Foster & 
Rees (1957) and Foster (1957). These papers will be referred to as I' and ‘II’, and reference 
is made to them for definitions, 


The interpolation requirements are similar to those for II. Some uses of the tables are 
indicated in I and II. 


2. METHOD OF COMPUTATION 


The computations were again carried out on the DEUCE Computer of the English Electric 
Company. Substantially the programme for II was used, with, of course, a modification 
for the computation of the function L(4; p,q). Let Imax, denote the greatest root of 


B- (vj A v B) | = 0, 


where A and B are independent estimates, based on v, and v, degrees of freedom, of a parent 
dispersion matrix of a four-dimensional multinormal distribution. Define 


L(4; P, q) = Pr Ux X x}, 


where p = }$(v,—3), q = 4(v,;—3). Then by a similar method to that used in I and IT, and 
employing a formula of Roy (1958), we obtain 


929 +1) L(4; p,q) = L(2p + 4,2) L(2; p,q) (p + 1) (2p +3) (2p + 2q + 1) (p+q+1) 
— L(2p + 3,29) L(2; p,q) (p +1)? (2p + 2g + 1) (2p +24 + 3) 
+1,(2p 2,29) (2; p +1, ) p(2p + 1) (p -q-- 1) (2p +.2q +3) 
— L(2p + 1, 2% I (2p + 3, 2q) p(p + 1) (2p + 2q + 1) (2p +29 +3) 
+ L(2p--3,2q) (p,q) a (p + 1,9) (2p + 1) (p+ 1) (p +4) (2p - 2q +3) 
— L(2p +2,24) L(p,q) 4.00 2. 0 (2p 1) (2p +3) (p-q) (p q-1) 
+ L(2p + 1,29) L(p-- 1,4) a,(p +2, 0 p(2p 4-3) (2p + 2g +1) (p q-- 1) 
+ LG; p,q) b. 2h ＋ 2, 2% (p +1)? (2p + 2 1) (2p + 2g +3). 


The ranges of p and q were chosen as in II, namely, p = 3(3) 4, q = 1(1)96. On this 
occasion a simplification of the programme was introduced by use of the same relations for 
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the integral and half. integral values of p in the computation of all of L(p, q), . and 
b. p, 2). These relations were: rey, 

L(p,0)-9 (p25p) 
L(po, 1) = «Pe, 
L(p+1,¢4+1) = xl,(p,g+1)+(1—2) L(p+1,q) (P> Po 92%), 
I.( Po: 4 * 1) = L. Po q) S {lpo q)- Lips 1,4) (q2 1); 
&, (po, 0) = , 


a,(p+1,0) =2a,(p,0) (P> po 
a, (po, 1) = (2p9 1) (1 — x) , 


a, (po 1,q4*1)-— 4.877470 
2po+2q+1 


2q+1 
b b, O) O (P> Po) 


Jeep ye 072p b) (ro 129. 


a,( Po 4+1) = (1-2)a4(psd) e 1): 


1 
b. (p, 1) Pot wena —2), 
Po 


b,(p+1,q+1) = _P(P+I+?) (ug (p,qgt1)+(1—2)b(P+1,9)} (P22» 029) 


~ 1+p(p+q+2) 
1 
be Dog +1) = PE (1—a)bal Pog) (0? 1) 


In these relations py was given the value either } or 1. (Inspection of the formula for 
L(4; p,q) shows that in fact both values of p are required for L(p,q), but only py = 1 for 


a,(p,q) and only p, = 1 for b, H.) 
The percentage points were then obtained exactly as in II. 


3. FURTHER WORK 


The existing programme could now be used with little modification for extensions of any 
of the Tables 1, 2 or 3 to higher values of p or q. It could also be used for computing per- 
centage points of the /-distribution, [,(p, q), itself. The method of computation is probably 


feasible for Æ = 5, but this would be near the limit of its usefulness, since provision for round- 


off errors makes the computation increasingly lengthy. Beyond this point, anew theoretical 
are presently contemplated. 


approach is probably required. No further tabulations 


The author is again indebted to the staff of the London Computing Service for assistance 
and to the Director of Research of the English Electric Company for permission to use their 
DEUCE Computer. 
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Generalized Beta distribution: 100P %, points for x 


I 
XA Vs 4 5 6 7 8 9 10 
n icd 
I 
5 080 | 0.9725 | 0-9779 | 09816 0-9842 | 0.9862 | 0-9877 | 0-9889 
85 9799 9839 9865 9885 -9899 -9910 9919 
-90 -9869 -9895 -9913 :9925 -9934 9942 9947 
95 9936 9949 9957 9963 9968 9972 9974 
| 99 | 9987 -9990 -9992 -9993 | -9994 9994 -9995 
7 080 | 0-8902 | 0-9083 | 0-9212 | 0-9309 | 0-9385 | 0-9445 | 0-9495 
“85 -9078 -9231 -9340 -9422 -9485 9536 9578 
| -90 927 -9394 9480 | -9545 -9595 -9636 -9668 
| 95 +9505 -9589 -9648 9692 -9726 9754 9776 
99 9789 9825 9850 9869 9884 9895 | 905 
9 0:80 | 0-8034 | 0-8311 | 0-8518 | 0-8678 | 0-8807 | 0-8913 | 0-9001 
-85 8264 “8512 8695 8838 +8952 -9045 9123 
-90 +8532 -8744 +8900 -9022 -9118 9197 9263 
95 8882 90⁴⁵ 9166 9259 9333 9393 -9443 
-99 -9381 9473 -9541 -9593 -9634 -9668 9696 
11 0-80 | 0.7259 | 0.7596 | 0-7856 | 0-8063 | 0-8233 0.8374 0.8494 
-85 7514 7824 8062 8251 -8405 8534 8643 
-90 7820 8098 8306 -8473 8610 8723 -8819 
-95 8236 8463 8636 +8773 8884 8976 9054 
-99 -8885 -9032 -9144 -9231 9302 9361 -9410 
13 0-80 | 0-6594 | 0-6965 | 0-7259 | 0-7498 | 0.7697 0.7865 0-8010 
85 -6858 7206 7479 -7701 7886 -8042 8176 
-90 -7180 7497 7746 7947 “8114 -8255 8376 
-95 7632 -7904 8116 8287 -8429 8548 -8650 
-99 -8374 8567 -8716 -8836 -8934 -9017 -9088 
15 080 | 0:6028 | 0:6417 | 0-6730 | 0-6989 | 0-7208 | 0:7396 0.7559 
85 6292 6661 6957 7201 7407 7584 7737 
-90 6619 6961 7235 7460 -7650 -7812 -7952 
95 7085 7387 -7628 -7825 7991 8131 8253 
-99 7882 8110 8290 8436 +8559 8663 8752 
17 0-80 | 0-5544 0.5940 0-6264 06536 0-6768 | 06969 0.7146 
85 5804 6183 | -6492 “6751 | 6971 7162 7329 
-90 -6128 -6485 6774 -7016 1222 7399 7⁵⁵⁴ 
“95 6599 -6920 7180 7396 7570 7737 7874 
99 7424 7677 7881 8049 8190 8312 “8417 
19 0-80 | 0-5128 | 0-5524 | 0-5853 | 0-6131 | 0-6372 0-6582 | 0:6768 
85 5381 5763 6079 6346 6576 6777 6955 
-90 5700 6063 6361 6614 6830 7019 7185 
95 6166 6499 6771 -7000 7197 ‘7367 | 7517 
-99 7003 7275 7495 7680 7837 7973 8092 
21 080 | 0:4768 | 05160 | 0-5489 | 0-5770 0,6018 0-6231 06423 
85 5013 6394 | 5712 5983 6219 6427 6611 
900 5323 5688 5991 -6250 6474 6670 6845 
-95 5782 6120 6400 6638 -6844 -7024 -7183 
-99 6619 69003 7136 7333 "1502 7650 7780 
23 080 0-4453 | 04839 | 0:5165 | 0:5447 | 0-5693 | 0-5912 | 0:6108 
85 -4690 5066 5388 5656 -5895 6107 6296 
90 4991 -5353 “5658 +5920 -6148 -6350 6530 
95 5439 5779 6004 6307 6519 -6706 6872 
99. (6270 6561 6803 7010 7188 7345 7483 
— 


— 


This table gives the values of x for which Pr( O «2)— 1,4; p, q)=P, where p= 4(v,— 3), g= 305 — 9" 


| 


25 0-80 0-4177 | 0:4554 0-4876 0:5155 0-5402 0-5622 | 0:5820 0-6000 


-85 4405 4774 5089 -5361 5601 -5815 4 s | 
90 | +4098. | -5054 | -5358 | 5621 -5851 | -6056 2940 | pere 
:95 133 | 5472 5768 -6004 | 6220 6412 | -6583 | 6737 | 
99 5951 0247 645 -6708 | -6894 | 707 7203 | -7334 
27 080 | 03932 | 0:4300 | 0-4616 | 04803 | 0-138 0:5358 | 0:5557 | 0-5738 
“85 4152 4513 4823 5094 -5333 | -5548 | 5741 5917 
30 | £53 4786 | -5086 | -5348 | 5580 5786 5973 -6141 
-95 | 4868 51904 -5479 | 5727 | -5945 | -6139 | 6314 | -6472 
:99 5661 | -5959 | -6211 -6428 | -6619 | 6787 -6938 | 7075 


29 yon 0:3713 0:4072 0:4382 0-4655 | 0-4898 | 05116 0-5315 0-5496 
85 3925 4279 4588 4851 5089 5303 54969 5673 


:90 4197 +4543 4840 5100 5331 -5538 | 5725 5896 
-95 -4609 | 4941 -5225 5472 5691 -5887 | 6064 6225 
99 5395 5694 5948 6168 6362 6534 6690 6830 
31 080 | 03518 | 0:3867 0-4170 | 0-4438 | 0.4678 | 0-4895 | 0-5092 | 0-5274 
85 +3722 4067 4365 4629 4868 -5077 | 5271 | -5448 


“90 3985 4323 -4615 4873 5102 5309 5496 5668 
95 4384 A711 4992 5238 5457 56654 5832 5995 


99 5152 5450 5704 5926 -6122 6297 6456 -6600 


33 080 | 03341 0.3681 | 03977 | 04240 | 0-4476 | 04691 0:4887 | 05067 
85 3538 3874 4167 4426 -4659 4870 5062 5239 


90 3792 4123 4410 4664 4891 5097 5284 -5456 

| 95 4180 4500 -4778 | 50022 -5240 5437 5616 5779 
| 99 4929 5224 5478 5700 5898 6075 +6236 6382 
35 080 03181 | 0-3511 0:3801 | 0:4058 | 0.4291 0:4503 | 0-4697 | 0:4876 
85 3371 3699 3985 -4240 4469 44678 4869 -5044 

5087 5258 

95 3993 4307 4581 -4822 -5039 -5235 “5414 5578 

99 ‘4724 -5016 -5268 -5490 +5688 5867 6029 6177 


37 080 | 03036 | 0.3357 | 0:3639 | 0:3891 0.4120 0.4329 0-4521 | 0:4698 
85 +3220 3538 3818 4068 4294 4500 4689 4863 
90 3457 3773 4049 4295 4516 -4718 4903 5074 
95 +3822 4130 4399 4637 4852 5047 +5225 +5389 
99 4534 4823 5073 5294 5492 5671 5834 5983 


39 080 0:2903 0:3215 0:3490 | 0:3738 0:3962 | 04107 | 04357 0:4532 
85 3081 3391 3665. 3910 4132 4335 +4522 4695 
90 3311 3618 „3889 4131 4349 4549 4732 4901 
4230 4466 4678 4871 5048 +5212 
5308 5487 5650 5800 


95 3664 3966 
“99 4359 4644 4891 5111 


41 080 | 0.2782 0-3085 | 0:3353 0.8595 0,3815 
88 2959. 328 i 3768) | BUS) 
0 3176 34% 0 -3 921. [19-8978 0| 00:5 199 4300 4572 4740 
95 3519 3814 40% 40 | 4516 .4707 | 4883 5046 
99 4196 4476 +4722 4939 5135 5314 5477 5627 


43 0:80 0-2670 0:2965 0.3227 | 0-3463 0:3679 | 0:3878 0-4061 0:4233 
3130 3391 3626 3841 4037 4219 4389 
3837 4048 :4242 +4422 4588 
4364 4553 4727 4889 
5313 5464 


0-4017 0:4204 0:4377 
4181 4365 4537 


85 2836 
90 3051 3345 -3604 
95 3385 3674 3929 4157 
99 -4044 -4321 -4563 -4778 4973 5151 


This table gives the values of æ for which Pr(B,us <2)= Ialt; p. d) E, where p=}(va— 3), 9= 1 3). 


90 -3617 3940 -4222 4472 4697 4901 
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Generalized Beta distribution (cont.) 
" 4 5 6 | 7 | 8 9 | 10 a 
Và | 
| P | | | | 
45 0-80 | 0-2566 | 0-2853 | 0-3109 | 0-3340 | 0-3552 | 0-3747 | 03928 0.4097 
-85 +2727 -3014 -3269 3499 | -3710 3903 -4083 4250 
| -90 -2936 -3222 3470 “3704 -3913 4104 4281 4446 
| -95 -3260 -3543 -3793 -4017 4921 4409 581 | 4742 
-99 -3903 4175 4414 4627 -4820 -4997 -5159 | -5309 
47 080 | 0-2471 | 0-2750 | 0-2999 | 0-3226 | 0-3433 | 0-3625 | 0-3803 | 0-3969 
85 2626 2906 3135 3381 3587 3778 3954 | -4119 
“90 2830 3109 3357 3581 3786 3974 4149 4312 
95 3144 3421 3666 3887 4088 -4273 4444 | 4603 
-99 3772 4039 4274 4485 4676 4851 5013 58162 
49 080 | 0-2382 | 0-2654 0.2897 0.3119 0-3322 | 0-3510 0.3688 | 0-3850 
85 2533 -2806 -3049 -3270 -3472 -3660 3834 -3997 
-90 -2730 -3003 -3246 3465 3666 3852 -4024 -4185 
95 3036 3307 3548 3764 3963 -4145 4314 -4471 
99 -3648 3911 4143 4351 -4540 4714 | 4874 -5023 
51 080 | 0.2299 | 0-2564 | 0-2802 | 0-3018 | 0-3218 | 0-3403 | 0-3575 | 0-3737 
85 -2446 -2712 +2950 3166 3365 3549 3720 3881 
90 2638 2904 -3141 -3357 3554 3737 3907 4066 
-95 -2936 3200 3436 3649 3844 4024 4191 4347 
99 3533 3791 4019 4224 4412 -4584 -4743 -4891 
53 080 | 0.2222 | 0.2480 | 0:2713 0-2924 | 0:3120 0.3301 | 0-3471 | 0-3630 
85 -2364 2624 -2857 3068 3263 -3444 3613 | 3271 
-90 2551 -2811 -3043 -3255 3449 3629 -3796 -3953 
-95 -2841 3100 3332 “3541 3733 3910 -4075 -4229 
-99 3424 -3677 3902 4105 4290 4460 4618 4765 
55 0.80 | 0-2149 | 0-2402 | 0-2629 | 0:2836 0-3028 | 0.3206 | 0:3372 0.3529 
85 2288 -2542 -2769 2977 3168 3346 -3512 3668 
-90 -2470 2724 -2951 3159 3349 -3526 -3691 -3846 
-95 :2752 -3006 3233 3439 3028 3802 3965 4117 
99 +3322 3571 3792 3992 4175 4343 4499 -4645 
57 080 | 02082 | 0:2328 | 02550 0.2753 0-2941 | 0-3116 | 0:3279 | 0-3434 
-85 -2217 -2464 2687 2890 3078 3253 -3416 -3570 
-90 2394 +2642 -2865 3068 3255 3429 -3592 -3745 
95 2669 -2918 -3140 3342 3528 3700 3861 4011 
99 +3225 3470 3687 -3885 4065 4232 -4387 -4531 
59 0:80 | 0-2018 | 0:2258 | 0-2475 02674 | 0.2858 | 03030 03191 | 0.3343 
85 +2150 2391 2609 2808 -2993 3164 3325 347 
90 +2322 2565 2783 2983 3167 +3338 3498 3648 
-95 -2591 2834 3052 3251 3434 3603 3762 3910 
-99 -3134 +3374 -3589 3783 -3961 4126 4279 +4422 
61 080 | 01958 0.2193 0-2405 | 0-2600 | 0-2781 02949 0-3108 0.3257 
“85 2087 +2323 2536 27³¹· -2912 -3081 | 2239 +3388 
+90 +2254 +2492 +2706 -2902 -3082 3251 2408 3557 
95 2517 2755 2969 3164 3344 3511 3668 3814 
:99 :3048 3284 3495 3686 3862 4025 4177 4319 
63 0-80 011902 | 02131 | 0-2339 | 0-2530 | 0-2707 02873 | 0-3029 | 0-3176 
85 -2027 2258 | 2467 -2658 -2836 -3001 3157 -3304 
90 -2191 :2423 | 2633 -2825 -3002 3168 3323 3470 
95 -2447 -2680 2890 3082 -3259 3424 3578 3723 
99 2966 3198 3406 3594 3768 3929 4079 4219 
This table gives the values of x for which Pr _<«)=1,(4; p, q) - P, bus fu m Te - 
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Generalized Beta distribution (cont.) 
= | A 
Ny, 44 | 6 | 7 8 . m 
1 
1 » | 
E | | | d 
65 080 | 01849 | 02073 | 0-2276 | 0-403 | 0:2637 | 0-2800 | 0-2953 | 0:3098 
-85 -1971 2196 | 2401 -2589 2763 -2926 | -3079 | 3224 
90 -2131 2358 2504 2752 -2927 -3089 | 3242 3387 
95 2381 | 2610 -2816 | -3004 | 3178 | -3341 | 3493 | -3636 | 
-99 2889 -3117 .3321 | 3507 -3678 | -3837 | -3985 | -4125 | 
67 080 | 01798 | 0-2018 | 0-2217 | 02400 | 0-2571 | 0-2731 | 62881 | 0-3024 | 
-85 ‘1917 | 2138 2339 -2628 | -2604 | 2854 -3006 | -3148 
90 2074 2290 -2498 | 2683 2854 -3015 | -3165 | +3307 
-95 2318 2542 2745 -2080 | 3101 | 3201| -3411 -3552 
-99 2816 -3039 | 3240 -3424 | 3592 3740 3896 4034 
69 080 | 0.1751 | 01965 | 0:2160 | 0-2340 | 02508 | 0:2665 0.2813 | 0-2953 
85 1867 | -2083 | 2280 240 2629 | -2786 | -2935 3075 
90 2019 2238 -2435 | 2617 2784 9943 | 3092 3232 
:95 2250 -2479 2677 | -2859 | -3028 | -3185 | -3333 3473 
99 2746 2966 3163 3344 3511 3666 “3811 -3947 
71 0-80 | o-1706 | 0-1915 | 02107 | 02283 | 02448 | 02002 0.2748 0.2886 
| 85 -1819 | 2031 2223 -2401 250% | -2721 | 2867 | 1:3005 
-90 1968 2182 2376 2554 2720 2875 3021 3160 
:95 2202 2418 2613 27 | 2958 | 218 3259 3396 


99 2679 2896 -3090 | 8268 3482 35588 3729 36864 
| 73 0.80 0.1663 01868 | 0:2055 | 02229 0.2390 0.2542 0-2686 | 0-2822 


85 +1774 1981 -2170 +2344 -2506 2659 2803 2939 

90 | 1919 -2129 2319 2494 2658 | 2810 2954 3090 

95 2148 2360 +2552 2728 2891 -3044 +3188 -3323 

-99 | 2616 2828 -3020 +3195 3357 3509 3650 3784 

75 080 | 01622 0.1823 | 0-2007 0.2177 0.2336 0.2485 | 02626 0.2760 
85 1730 1934 -2119 -2290 -2450 +2599 -2741 +2875 

90 +1873 2078 2265 2437 2598 +2748 2890 -3024 

95 2097 +2305 2493 2666 2827 2978 3119 3253 

99 2555 2764 2953 3126 +3286 3435 3575 3707 

77 080 01583 | 0-1780 | 0-1960 0.2127 02283 0.2430 0.2569 | 02701 

| 85 1689 1888 -2070 +2238 2395 +2543 2682 2814 
| 90 1829 2030 -2214 -2383 +2541 2089 2828 2961 
95 -2048 2252 2437 2608 2766 -2914 3054 3186 

99 2498 2703 +2889 3059 3217 3364 3502 3633 

70 0-80 | 01546 | 0-1739 | 0-1916 0.200 0.2233 | 0.2378 02514 | 0-2644 

| 85 +1650 -1845 +2024 2189 -2343 -2488 -2625 -2756 
| 90 1787 1984 -2164 2331 2486 2632 2769 2900 
95 2002 2202 +2384 2551 2707 2853 2991 +3122 


+2645 2827 2995 3151 3296 3433 3562 


99 +2442 
81 0680 | 0.1510 0-1700 | 01874 0.2035 | 0-2185 0.2327 0.2462 | 0-2590 
85 1612 -1804 1979 -2141 2293 2436 2571 -2700 
90 +1746 -1940 2117 2281 +2434 2577 2713 +2842 
95 1957 2154 2333 2497 2651 +2795 2931 3060 
3088 +3231 3366 3494 


99 2390 2589 2769 2934 
0-1991 0.2140 | 02279 0:2412 0:2538 


pe ER peer en :2245 :2386 2519 2646 


85 1576 1765 1937 2096 

90 1708 1898 2072 2233 2383 +2525 s Ta 
95 1914 2108 +2284 +2446 2597 2739 2 

99 : 2712 +2875 3027 3168 -3302 3428 


2339 2535 * 


This table gives the values of for which Pr(f,,.. & ) = I. (4; p, 9) . where p=}(Y2—3), q=- 3). 


Ld 
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Generalized Beta distribution (cont.) 


| | 
A 5 6 7 8 9 10 
Pow | | 
P | | | | 
85 0-80 | 0-1444 | 0-1627 | 0-1794 | 0-1950 | 0-2096 | 0-2233 | 0.2364 
80 -1542 1727 | -1896 | -2053 | -2200 | 2338 2469 
-90 1671 -1858 2029 -2187 2335 2474 -2606 
-95 1874 2064 2236 2396 -2545 -2685 2818 2943 
99 | 2290 2483 | 2658 2819 :2968 ‘3108 | 3240 3365 
87 0-80 01413 | 0-1593 | 0:1757 | 0-1910 0-2053 | 0-2189 02317 0-2440 
85 -1509 1691 -1857 | 2011 2155 | 2292 -2421 -2544 
90 1635 1819 | -1987 | -2143 | 2289 2426 | 2556 2680 
-95 -1834 2021 | 2191 2349 2496 | -2634 | 2764 | -2888 
99 2244 2434 | -2606 +2764 3912 | 3050 3180 3304 
89 0-80 | 0-1384 | 0-1560 | 0-1721 0.1872 | 0-2013 0-2146 | 0.2273 | 0-2394 
85 1478 | -1656 1819 | -1971 2113 2247 2375 -2496 
90 1602 1782 -1948 | -2101 -2244 2379 | -2508 -2629 
95 ‘1797 | -1981 | 2148 2303 2448 | 2584 2713 -2835 
99 2199 2386 2556 -2712 2857 2994 3123 -3245 
91 0-80 | 0-1355 | 0-1528 | 0-1687 | 0-1835 | 0-1974 | 0-2105 | 0-2230 | 0-2349 
85 -1447 1623 | -1783 | 1932 2073 +2205 2330 2450 
90 -1569 1747 | -1909 | 2060 2201! 2335 2461 2581 
-95 -1761 1942 2106 2259 -2402 -2536 2663 2784 
-99 -2156 2340 2507 2661 2805 | 2940 3067 -3188 
93 0-80 | 0-1328 | 0-1498 0.1654 | 0-1800 | 0.1936 0-2066 | 02189 0.2306 
85 -1418 1591 -1749 -1895 2033 2164 2288 2406 
-90 1538 -1713 1872 -2021 2160 | 2291 -2416 -2535 
“95 1726 1904 2066 22¹7 2357 -2490 2615 -2734 
99 2115 -2296 | 2461 2613 275⁵ 2888 3014 3133 
95 0:80 | 0-1302 | 0-1469 | ©1622 | 0-1765 0.1900 | 0-2028 | 02149 | 02265 
-85 1391 1560 -1715 1860 1996 -2124 -2246 -2363 
-90 1508 1680 44837 1983 2120 -2250 2373 2490 
“95 1693 1868 2028 2176 +2314 2445 2569 2687 
99 -2075 -2253 -2416 2566 2706 2837 2962 3080 
97 080 | 0:1277 | 01441 | 0-1592 | 0-1733 0-1865 | 0.1991 02111 02225 
-85 -1364 1530 -1683 1826 1959 2086 -2206 2321 
90 1479 | 1648 1803 -1947 -2082 -2210 2331 +2447 
95 1661 -1833 -1990 2136 -2273 2402 | 2324 -2641 
99 2037 :3213 -2373 2521 2059 2789 2012 -3028 
99 080 | 031252 | 0-1414 | 01562 | 0-1701 | 0-1832 | 0-1956 | 0-2074 0.2186 
85 1338 “1502 -1652 -1792 1924 2049 -2168 -2281 
0 | -1451 | -1618 | -1770 | -1912 | -2045 | 2171 2291 | 2405 
95 -1630 -1800 -1955 -2098 -2233 -2360 -2481 2596 
99 2000 -2173 2331 2477 -2614 -2742 :2863 2979 
101 0-80 | 0.1229 | 0-1388 0.15834 0-1671 | 0-1799 | 0.1921 | 02038 | 0:2149 
-85 1313 1474 1622 1761 -1890 -2014 -2131 :2243 
-90 +1425 1588 1738 -1878 | 2009 -2134 +2252 2364 
-95 -1600 1767 1920 2062 22195 -2320 -2439 -2553 
99 1964 -2135 | 22901 2435 -2570 2696 2816 2931 
103 0-80 | 0-1207 | 01363 | 01507 0-1641 | 0-1768 0-1888 02003 0.2113 
85 1289 -1448 | -1594 ‘1730 | -1858 -1979 2095 2205 
MEA PME 2500 | :1708 | -1846 | -1975 |. -2008 | 2214 | 32325 
ME 1887 2026 2158 2281 2399 :3511 
:99 | 1930 2098 2252 | 2394 2527 2632 2771 2884 
— 


This table gives the values of z for which Pr(0,,, «2) — 1,(4; p, q)=P, where p- m =3), "m 101-8). 


107 


109 


111 


113 


115 


117 


119 


121 


123 


0-80 
85 
90 
95 
99 

0-80 
85 
90 
95 
99 

0-80 
85 
90 
95 
99 

0-80 
85 
-90 
95 
99 

0-80 
85 
90 
95 
99 

0-80 
85 
90 


99 


| 
| 
I 
| 


0:1185 
1266 
1374 
1544 
1896 
1164 
1244 
1350 
1517 
1864 
0:1144 
1223 
1327 
1492 
1833 
0-1124 
1202 
1304 
1467 
1803 
0-1105 
1182 
1283 
1443 
1774 
0-1087 
1162 
1262 
1419 
1746 


0:1070 
-1144 
+1242 
1397 
1719 

0:1052 
1125 
1222 
1375 
1692 

0:1036 
1108 
1203 
1354 
1667 

0:1020 
1091 
1184 
1333 
1642 


F. G. Foster 
Generalized Beta distribution (cont.) 
| | 
5 | 6 | 7 8 
— M | | 
| j 
i 
0.1339 | 0-1480 | 01613 | 01738 
1422 -1566 -1700 | -1826 
1533 1678 -1814 1042 
1706 1854 -1992 +2122 
-2063 2214 23585 -2486 
0.1315 | 0-1455 | 01586 | 0-1709 
-1398 1539 | -1671 -1796 
1506 1650 | -1784 | 1910 
1677 1823 -1959 | 2087 
2028 2178 2316 -2446 
01293 | 01430 | 0-1559 | 0-1681 
-1374 1513 -1644 | -1766 
-1481 -1622 | 1754 | -1878 | 
1649 1793 -1927 | 2053 
-1995 -2143 | 2280 2408 
0.1271 | 0-1407 | 0-1533 | 0-1653 
1351 -1488 | -1617 1738 
1456 1596 -1726 | 1848 
1622 1764 | -1896 -2021 
1963 2108 2244 2370 
0.1250 0-1384 01509 | 0-1627 | 
1329 1464 | 1591 1710 
1432 1570 | -1698 | 1819 
1595 -1736 1866 1989 
1932 2075 | 2209 2334 
0.1230 | 0-1361 | 01485 | 0-1601 | 
-1307 -1441 1565 1683 
1409 -1545 1671 -1791 | 
1570 -1708 -1837 -1958 | 
1901 +2043 | -2175 -2299 
0.1210 | 01340 | 01461 | 0-1576 
+1286 -1418 -1541 -1657 
1387 +1521 1645 1763 
1545 -1682 1809 -1929 
1872 -2012 2143 -2265 
0.1191 0.1319 | 0-1439 | 0-1552 
1266 1396 1517 -1632 
1365 1497 +1620 1737 
1521 -1656 1782 -1900 
1844 -1982 2111 2232 
0.1172 0209 | 01417 | 01529 
1246 .1374 | 1494 -1608 
1344 44474 | 1596 1711 
1498 1631 1755 1872 
1816 -1953 2080 -2200 
031154 | 0-1279. 0.1396 | 0-1506 
1227 -1354 +1472 1584 
1324 -1452 +1572 -1686 
1476 1607 1729 -1845 
1789 -1925 +2050 2168 


This table gives the values of z for which Pr( 


0-1857 
:1946 
-2063 
-2244 
2610 


0:1826 
“1914 
2029 
+2208 
2509 

0-1796 
-1883 
:1996 
2172 
+2529 

0-1767 
1853 
1964 


2138 


0-1739 


2490 


1824 


+1934 
+2105 
+2452 


0-1712 
1795 
1904 
2073 
+2416 


| 0-1086 


1768 
1875 
-2042 
-2381 
0:1660 
-1741 
+1847 
-2012 
2340 


0-1636 
17116 
:1820 
1983 
2313 


0-1612 
1691 
1794 
1954 
+2280 


0:1970 
2060 
2178 
2360 
2727 


9.1937 
2020 
2142 
2322 
2685 

0:1906 
-1994 
2108 
-2286 
2643 


0-1876 
1962 
2075 
-2250 
:2603 


0-1847 
1932 
2043 
:3216 
:2565 


0-1818 
-1902 
-2012 
:2183 
2527 


0.1791 
1874 
1982 
+2150 
2490 


01764 
1846 
1953 
:2119 
:2455 


0:1738 
1819 
:1924 
2088 
2420 

0-1713 
1792 
1897 
2059 
2386 


02078 
2169 
+2288 
2471 


2839 


0-2044 
2134 
2251 


2432 


2795 
0-2012 
2100 
:2215 


2394 


2753 


0-1980 
2067 
2181 
2357 


2711 


0-1950 
2036 
-2148 
-2322 
-2671 

0-1920 
-2005 
-2115 
2287 
2633 

0.1891 
-1975 
2084 
-2254 
-2595 


0-1863 
1946 
2054 
2221 
2558 

0:1836 
-1918 
2024 
2189 
2523 

0-1809 
1890 
1995 
2159 
2488 


pax. I.; p, d) =P, where p- 02-3), «30-3 
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Generalized Beta distribution (cont.) 
Í i | 3 
^w M MEET XS Me: 7 8 9 10 T | 
2 d | | | 
| P ` | | 
125 0:80 | 0-1004 | 0-1137 | 0-1260 | 0-1375 | 0-1484 | 0-1589 | 0-1688 | 0-1784 
-85 1074 | -1209 | -1334 | -1451 | -1561 1666 -1767 | -1864 
90 1167 1304 1431 -1549 1662 1768 1870 4907 
-95 1313 -1454 | -1583 “1704 | -1818 -1927 2030 2129 
-99 -1618 1763 -1897 | 2021 -2138 2249 | 2354 -2454 
127 080 | ©0989 | 01120 | 0-1241 | 0-1355 | 0-1463 | 01566 | 01664 | 0-1759 
| -85 -1058 | -1191 -1314 1430 -1539 | -1643 -1742 | «1898 
| -90 1149 | -1285 | -1410 -1527 -1638 -1743 1844 -1940 
-95 -1294 | -1433 | -1561 -1680 | -1793 -1900 | 2002 2100 
99 | -1594 1738 +1870 -1993 | 2109 -2218 | 2322 -2421 
129 0:80 | 0-0975 | 0-1104 | 0-1223 | 01336 | 01442 | 0-1544 | 01641 | 0:1735 
85 -1043 ‘1174 | -1295 -1409 | -1517 1620 -1718 | +1812 
90 1132 1266 -1390 1506 -1615 1719 -1818 | -1914 
-95 1275 | -1412 1538 -1656 | 1768 1874 | 1975 2071 
-99 -1572 1714 | -1844 -1966 | 2080 :2188 | 22291 2389 
131 080 | 0-0960 | 0-1088 | 0-1206 0.1317 0-1422 | 0-1523 | 01619 01710 
85 -1027 3157 1277 -1389 | -1496 | 1597 -1695 | -1788 
-90 1116 -1248 | -1370 1485 | -1593 1695 | -1794 | +1888 
-95 -1257 1392 1517 1634 -1744 1848 -1948 -2044 
99 1549 1690 -1819 1939 | 2052 2159 22261 2358 
133 0-80 | 00947 | 0-1072 | 01189 | 0-1299 | 0-1403 | 0-1502 0-1597 0.1688 
-85 -1013 -1141 -1259 1370 | -1475 1576 | -1672 1764 
-90 1100 +1231 -1351 1464 | -1571 1672 -1770 | -1863 
-95 1239 1373 1496 -1611 -1720 1823 -1922 -2017 
-99 -1528 | -1667 1794 1913 2025 -2131 -2232 2328 
135 0-80 | 00933 | 0-1057 | 0-1173 0-1281 | 0-1384 | 01482 | 01575 0.1666 
85 -0998 | -1125 -1242 -1351 -1455 1555 -1650 -1741 
-90 -1085 -1213 1333 1444 1550 1650 -1746 1839 
95 1222 1354 1476 1589 1697 1799 1897 1991 
99 1507 +1644 11770 1888 1998 2103 2203 :2298 
137 0-80 | 0-0920 | 0-1043 | 0-1157 | 0-1263 | 01365 | 0,1462 | 01555 01644 
85 -0985 1109 -1225 1333 | -1436 1534 1628 1718 
90 1070 | -1197 1314 +1425 -1529 -1628 1724 :1815 
-95 1205 | -1335 | -1456 -1568 | -1675 1776 | 1872 -1965 
99 1487 -1622 1747 11863 | -1972 2076 22175 2209 
139 0:80 | 0.0908 | 0-1029 | 0-1141 | 0-1247 | 0:1347 | 01443 | 01534 01623 
85 -0971 -1094 -1208 1315 | -1417 -1514 -1607 -1696 
-90 +1055 “1181 +1297 “1406 1509 -1607 4701 | 1792 
95 | -1189 | -1317 | -1436 | -1548 | -1653 | -1753 | -1849 | 1940 
99 1467 1601 -1724 1839 | -1947 2050 2148 2241 
141 080 | 0-0895 | 0-1015 | 0-1126 | 0-1230 | 0-1329 | 0-1424 | 0,1518 0,1602 
85 | 0958 1079 -1192 | 1298 +1399 31494 | -1586 +1675 
'90 | -1041 | -1165 | -1280 | -1388 | -1489 | -1587 | -1680 | +1769 
-95 ‘1173 | -1300 | +1418 1528 -1632 1731 -1825 -1916 
99 1448 1580 1702 -1815 | -1923 2024 2121 -2214 
143 080 | 0:0883 | 01001 | O-1111 | O-1214 | 0-1312 | 0-1406 | 0-1496 | 01582 
85 0948 -1065 | -1177 | -1281 | -1381 -1475 | -1566 | 1004 
90 1027 1150 1263 1370 1470 1567 -1659 1747 
95 1157 1283 1399 1508 -1611 -1709 -1803 -1892 
99 1429 | -1560 | -1680 1793 -1899 1999 | 2095 :2187 
h | 


This table gives the values of z for which Pr( O, «2) - I,(4; p, q) - P, where PA- 3), «7 39— 3. 
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155 


157 


159 
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This table gives the values of z for which Pr 
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Generalized Beta distribution (cont.) 


0-0988 
1051 
1135 
1266 
1540 


0:0975 
1038 
1120 
1250 
1521 


0-0963 
:1025 
:1106 
1235 
1502 


0-0951 
1012 
1092 
1219 
1483 


0-0939 
0999 
1079 
1204 
1466 


0-0927 
0987 
1066 
1190 
1448 


0:0916 
0975 
1053 
1176 
1431 


0-0905 
0963 
:1040 
:1162 
1414 


0-0894 
0952 
1028 
“1148 
1398 


0-0884 
0941 
1016 
1135 
1382 


0-1097 
1161 
1247 
1381 
1659 


0-1082 
1147 
:1231 
:1364 
1639 


0-1069 
:1132 
:1216 

1347 
1618 


0-1055 
-1118 
-1201 
-1330 
1599 

0-1042 
-1104 
1186 
-1314 
-1580 


0-1030 
-1091 
1172 
1299 
1561 


0-1017 
1078 
1158 
1283 
1543 


0-1005 
-1065 
“1144 
+1268 
+1525 


0:0993 
1053 
1131 
1253 
1508 


0-0982 
-1040 
1118 
1239 
1491 


0-1199 
1265 
1352 
1489 
1770 


0-1183 
1249 
1335 
1471 
-1749 


0-1169 
+1233 
1319 
+1452 
1727 


0:1154 
+1218 
+1302 
+1435 
1707 


0-1140 
1203 
1287 
1417 
1687 


0-1126 
+1189 
1271 
1401 
1667 


0-1113 
1175 
:1256 
1384 
+1648 


0-1100 
“1161 
+1241 
+1368 
1629 


0-1087 
“1147 
+1227 
+1352 
1610 


0:1074 
-1134 
+1213 
1337 
+1592 


0:1295 
+1363 
-1452 

1591 

1875 


0-1279 
1346 
1434 
1571 
1853 


0-1263 
1329 
1416 
+1552 
1830 


0-1248 
+1313 
1399 
1533 
1809 


0:1233 
1297 
1382 
“1515 
1787 


0-1218 
+1282 
1366 
1497 
1767 


0-1203 
1267 
1350 
1480 
1746 


0-1189 
-1252 
1334 
1463 
1727 


0:1176 
1237 
1319 
+1446 
1707 

0.4162 
+1223 
+1304 
-1430 
1688 


0-1388 
“1457 
1547 
1688 
1975 


0:1371 

:1439 
1528 
1667 
1951 


0-1354 
1421 
1509 
1647 
1928 


0-1337 
:1404 
1491 
1627 
1905 


0-1321 
1387 
1473 
1608 
1883 

0-1306 
-1371 
1456 
1589 
1861 


0-1290 
+1355 
1439 
1571 
1840 


011275 
+1339 
+1422 
+1553 
+1820 


0:1261 
:1324 
+1406 - 
1535 
1799 

0:1246 
1309 
1390 
1518 
1780 


0:1477 
1547 
1638 
1780 
2070 


0-1459 
+1528 
1618 
1759 
:2045 

01441 
1509 
1598 
1738 
2021 


0:1423 
-1491 
1579 
1717 
1998 


0.1406 
1473 
1561 
1697 
1975 


0-1390 
1456 
+1542 
1677 
+1952 


0:1374 
1439 
+1525 
+1658 
1930 


0-1358 
1423 
1507 
1639 
1909 


0:1342 
1406 
1490 
1621 
1888 


0:1327 
1391 
1474 
1603 
1867 
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0-1562 
1633 
-1726 
1869 
2161 


0-1543 
1613 
170⁵ 
1847 
2135 


0-1525 
-1594 
1684 
+1825 
+2110 

0-1506 
1575 
1664 
1803 
-2086 


0-1488 
+1556 
+1645 
1783 
2062 


0.1471 
1538 
1626 
1762 
2039 


0:1454 
-1520 
-1607 
+1742 
-2016 


0:1437 
+1503 
1589 
+1722 
1994 


0-1421 
+1486 
+1571 
1703 
1972 

0:1405 
-1470 
+1554 
1685 
+1951 


aa. <a)=T.(4s p, =P, where 7103-8), 7 10.9 
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167 


169 


171 


175 


177 


179 


181 


173. 


Upper percentage points of the generalized Beta distribution. III i 


Generalized Beta distribution (cont.) 
8 9 10 

0.149 | 0-1232 | 0-1312 | 0-1390 

-1210 1294 | -1375 | -1453 

-1289 1375 | -1457 | -1587 

4414 1502 -1586 | -1666 

-1670 1760 -1847 | -1930 

0-0761 | 0-0864 | 0-0960 | 0-1050 | 01136 | 01219 | 0-1298 | 0-1375 

0815 |- 0919 -1017 -1109 | -1196 -1280 | -1360 | -1438 

-0886 |- -0993 | -1092 1186 -1275 -1360 | -1441 -1520 

0999 | -1109 | -1211 1307 -1398 1485 | -1568 | +1648 

-1236 -1351 1458 -1558 | -1652 -1741 -1827 -1910 

0:0752 | 00854 | 0-0949 | 0-1038 | 01124 | 0-1205 01284 0-1360 

0805 | -0909 | -1005 1096 -1183 1266 -1345 | -1422 

-0876 | 0982 1080 1173 -1201 -1345 | -1426 | 1504 

0988 | -1097 1198 1293 | 1383 1469 -1552 | 1631 
1222 -1336 1442 +1541 -1634 -1723 | -1808 | 1890 

0:0744 | 00844 | 0-0938 | 0-1027 | 0-1111 | 0-1192 | 0-1270 | 0-1345 

0796 -0899 | 0994 1084 -1170 1252 -1331 1407 

0866 0971 1068 1160 +1247 1331 “1411 -1488 

-0977 -1085 1185 1279 -1368 1453 -1535 1614 

1209 -1322 1426 1524 | -1617 -1705 | -1789 | -1870 

0:0735 | 00835 | 0-0928 | 0-1016 | 0-1099 | 0-1179 | 0-1257 | 0-1331 

-0787 0889 0983 1073 -1157 1239 1317 1392 

0856 0960 1057 1147 -1234 -1316 | -1396 | 1472 

0966 -1073 1172 -1265 | -1354 -1438 | -1519 | +1697 

1196 1308 -1411 1508 -1600 1687 -1771 -1851 

0:0727 | 00826 | 0-0918 | 0-1005 | 0-1088 | 0-1167 | 01243 | 01317 

0779 | 0879 0973 -1061 -1145 1226 -1303 -1378 

-0847 | 0950 1045 1135 -1221 -1303 1381 -1457 

-0955 -1061 1159 -1252 1339 1423 1503 +1580 

1183 1294 1396 +1492 -1583 -1670 1752 -1832 

0-0719 | 00817 | 0-0908 | 00994 | 0-1076 | 0-1155 | 0-1230 | 0-1303 

0770 | -0870 | 0962 1050 | -1133 -1213 1289 +1363 

0838 | 0939 -1034 1123 | -1208 1289 1367 +1442 

0945 | 1050 1147 1239 | -1325 -1408 1488 -1564 

‘1170 | -1280 1382 -1477 | -1567 -1653 1735 1814 

0:0712 | 0-0808 | 0-0898 | 0-0984 | 0-1065 | 0-1143 | 0-1218 | 01290 

0762 | -0860 | -0952 1039 -1121 1200 +1276 1349 

0829 0929 -1023 -1111 -1195 -1276 1353 “1427 

0935 | 4039 1135 1226 -1312 -1394 1473 1549 

-1158 4267 1367 1462 -1551 1636 41717 1796 

0:0704 | 00800 | 0-0889 | 0-0973 | 0-1054 | O-1131 | 0-1205 01277 

: 1110 -1188 | -1263 | +1836 

1183 1263 1339 -1413 

1298 -1380 1458 +1533 

+1535 +1620 1700 -1778 

01043 | O-1119 0.1193 01204 

098 | -1176 | -1250 | 4322 

“1171 -1250 1326 1399 

-1285 1366 -1443 | 1518 

1520 -1603 | 1684 4761 


187 0-80 


189 0-80 


191 O 


193 080 


| 
| 
195 0:80 | © ; : : 
R - 102-3), g= 20 3). 
This table gives the values of w for which Pr(Opas, <2)=Lal4s P ) - P» wise p= ai" 2) 1 
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ESTIMATION OF PARAMETERS OF MIXED EXPONENTIALLY 
DISTRIBUTED FAILURE TIME DISTRIBUTIONS 
FROM CENSORED LIFE TEST DATA* 


By WILLIAM MENDENHALL 


North Carolina State College and Research Techniques Unit, 
London School of Economics 


AND R. J. HADER 
North Carolina State College 


SUMMARY. Statistical methods in life testing analysis have been developed in the past primarily 
for the case of a single failure population. In this paper a failure population which can be divided 
into subpopulations, each representing a different type or cause of failure, is considered. Estimates 
of the population parameters are obtained in the case where the subpopulations are exponentially 
distributed and sampling is censored at a predetermined test termination time. 


1. INTRODUCTION 


Mixed failure populations are encountered in many fields of applied science. In particular, 
engineers have been cognizant of a phenomenon described as ‘early failures’ in tests on 
electronic tubes and other devices. It has frequently been observed that the failure rateis 
initially relatively high, then actually decreases with increasing age. As the item becomes 
still older the failure rate either becomes constant or again increases with age depending on 
the basic failure mechanism involved. This behaviour suggests strongly that the population 
is not homogeneous but rather is made up of several subpopulations mixed in unknown 
proportions. 

For practical purposes, the engineer may divide the failures of a system, or a device, into 
two or more different types of causes. An example is presented by Acheson & McElwee 
(1951), who divided electronic tube failures into gaseous defects, mechanical defects, and 
normal deterioration of the cathode. One would like to know the fraction of the population 
which will fail due to each cause in order to optimally concentrate effort on redesign of the 
system or to improve manufacturing methods. In addition, it would be desirable to know 
the distribution of failure for each type or cause of failure, for example, in order to institute 
an ageing process to eliminate early failures from production. Other references to mixed 
failure populations are made in papers by Davies (1952), Epstein (1953), Herd (959% 
Madison (1955), Steen (1952), and Wilde (1952). | 

This paper will be concerned with the problem of estimating the parameters of a mixed | 
population model based upon a sample censored at a fixed test termination time. Attention | 
vill be directed primarily to the case of two subpopulations of failure, each exponentially 
distributed, because of the light which these results shed on the general problem of estimating | 
parameters from mixed population models. The problems of estimation in the case of and 


number of failure subpopulations, each distributed according to a Weibull distribution, 
be considered briefly in § 8. 


* The authors are indebted to the General Electric Company for sponsorship and financial support 


in this research. The results of this rare part of es 3 JE 
Jaroli th » of Statistics 
Carolina State College, 1 Series, No. 171. . | 
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It should be noted at this point that the authors are not attempting to establish a case for 
the practicality of the exponential distribution in describing subpopulation failure. Ex- 
ponential failure distributions do occur often in practice, as noted by Davis (1952), Epstein 
(1953), and others, and this fact provides sufficient justification for their use as a starting- 
point in the consideration of mixed failure population problems. 


2. THE POPULATION AND THE MODEL 


A population is postulated which is composed of s = 2 subpopulations, representing failure 
types, mixed in proportion p: (1 — p), where0 <p < 1. Forsimplicity of notation, let q = 1—p. 
Each unit of the population conceptually contains a tag which indicates the subpopulation 
to which the unit belongs and hence defines the way in which that particular unit will fail. 
The information on the tag, i.e. the cause of failure, is obtained only after failure has occurred. 

The failure times for the ith subpopulation, i = 1, 2, are assumed to have a cumulative 
failure probability distribution defined by 


F(t) 21-e*« (0«t«o). (21) 


Here and hereafter, i may take the values 1 or 2. Tf p is the proportion of units belonging to 
subpopulation i = 1, then the cumulative distribution function for the population is 


F(t) = IATA), (2-2) 
and the density function, f(t) = pfit) A-). (2:3) 
Also let Gilt) = 1— F(t) (2:4) 
and G(t) = 1- F(t). (2:5) 


The probability function, G(t), is the probability that a unit will survive to time t and is 
called the survival function. 5 

If the entire population were put on test (or into service) the proportion of items belonging 
to each subpopulation would, in general, change with time. This is because the items from 
one subpopulation would die off more rapidly than those from the other. At time t, the 
subpopulations would be mixed in the proportions p(t):1— p(t). The quantities p v= 
1— p(t) will be called the conditional mixture proportions. Obviously 


500 =P, p(0) = »- Ge 
3. SAMPLING 


Due to restrictions on time available for testing, the experimenter frequently desires to 


conclude the life test after a predetermined length of time has elapsed or after a predeter- 
mined number of units have failed. Sampling of this type is known as censored sampling. 
This paper will consider the sampling as censored with a fixed test termination time. A iem 
dom sample of'n units is drawn from the population and placed on test. The test is termina’ 

at a fixed time, 7’, at which time r units have failed, r; from subpopulation (i) and n 2 = $ 
The time of failure of the jth unit from subpopulation (1), tij is observed. It will pis n x 
assumed that j always ranges from j = 1 to r; when not specified. The (n—r) uni Auge 
have not failed yield no information as to the subpopulation from which they were T 
The random variables observed are therefore the r; and the t; j = 1, beoi and i = 1, 2. 
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4. ESTIMATION OF PARAMETERS 


Case A. Relative magnitude of subpopulation parameters not known 
Estimates of the population parameters are obtained by the method of maximum likelihood, 
For convenience, assume that all measurements of time are in units of size T', the test 
termination time. Therefore let x = t/T and let 2; = &;/T. Then 
F(x) = 1—e- (0xz«oo). (41) 
Given a random sample of n units, the probability of r, units failing due to cause (1), r 
units failing due to cause (2), and (n —7) units surviving is the multinomial, 
n! 
ning (n — r)! 


PG. In) = PEOI ARONA). (4:2) 


i 
The conditional density of obtaining the ordered observations, 1, £i», Tro given r; and 
xy <1, is 


Peu, ti tals tys) = — 43 
(Zi; Via inlro ij S) EAT (4:3) 
Tt then follows that the likelihood, L, for the sample is 


n! = Ti Ts 
L- (n—r)! G(1)"7* phq pu f. 100 T. falta). (44) 
Taking the first partial derivatives of In L, 
ln L k(n—r) ri 4% l 
= l 45) 
5 A AA : 
on (1—k)(n—r) fz Tə 44) 
Bs Hoe rb 
om k(n—r)*r, (1—5)(n—r7)-rs (41) 
op p , 
where 75 — MS M (48) 
p eM + q &Mfs 
- (3) 


T V (gp) ehh" 


Referring to equation (2-6), it can be seen that k = (1), the conditional mixture proportion 
at the test termination time, x — 1. 
When the partial derivatives are equated to zero the estimating equations are 


s Rd an 


N peed, (eM) 
n 


By = Fat a-Bn-n. (412). 
ti E 
where 7. = A f 1 (418) 


r ~ 1+ @/p)exp (1/2, — 11A 
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The estimates of /4, H, and p must be obtained from the solution of the simultaneous 
equations (4-10, 4-11, 4-12, 413). Utilizing equations (4-10), (4-11) and (4-12) to substitute 
in equation (4-13) for p, Yi, and f), yields a single equation, involving only È, of the form 
E = g(h), 
where g(/) is a function of Ê. Since & is bounded, 0< <1, it is relatively easy to obtain E by 


considering gÊ) — Ê versus Ê and obtaining the solution where gk) - 0. The function 
gk) — Ê will be positive or zero when k = 0. 


25 


20 


p E e a 


function of E based on a sample from a truncated 


Fig. 1. Maximum likelihood estimate of f as a 
expressed in units of truncation time T. 


exponential distribution. Measurements 
can be obtained by using a modification of the maximum 


likelihood estimate obtained by Deemer & Votaw (1955) for the case of samples drawn from 
a single truncated exponential distribution. The maximum likelihood estimate of f, where 


the distribution is assumed to be truncated at time 7, is the solution of 
(B, ) (@4-1) = J. 
The solutions, J, can be obtained graphically from Fig. 1, where / is given as a function of 


. Choose the smaller z and identify this as subpopulation (1). Obtain the corresponding 


Vio from Fig. I. Then, substituting into 
Pro = T1 K f 


À good first approximation to Ê 


(414) 


E, (4:15) 
71 


solve for ho. 
ce g(0) > 0, the value of & which 


The quantity D, = 900) — fi can now be computed. Sin 
satisfies D — 0, and hence provides a solution to equations (4-10, 4:11, 4:12, 4-13), must be 
ative or positive. 


E «i, or Ê> Ê, depending upon whether Dy is neg 


Letting v- Tepili- ah (4:16) 
DIEN m (4-17) 

aS IHA) 
(4:18) 


Where de —v(n—r) [Ud 1/9 yr i 1r fil. 
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Choosing dD = —D, 


Ê, = E, +dk, (419) 
„ 
M +9(k,)® (dvjdky) 


This iterative process can be repeated until the desired degree of accuracy has been obtained, 
The estimates of £}, fs, and p are then obtained by substituting the solution for Ê into 
estimating equations (4-10), (4-11) and (4-12). 

If r; = 0, the estimating equations give no estimate of £;. This does not offer difficulty in 
a practical sense since, in this case, it is reasonable to conclude that /, is either very large or 
else p; = 0. Let us adopt, as a convention, the estimate, , = oo, meaning that £; is very 
large when r; = 0. In an experimental situation, we expect to choose n and T large enough so 
that the probability that r, = 0 or 72 = 0 is very small. Obviously, we cannot expect to 


obtain information on the failure parameters unless we are willing to test until some failures 
are observed. 


(4:20) 


Case B. Relative magnitude of subpopulation parameters known 


In many practical situations the experimenter knows the relative magnitude of ii, and fy. 
Without loss of generality, let us assume that f, <J. The maximum likelihood method 
described in case A produces some estimates for which f, Ve. When i > f, given f^, € fs 
we shall say that a crossover has occurred. Since it is known that £, < £, it would seem 
reasonable to choose /, = J when a crossover occurs. The maximum likelihood estimate of 


fi =f, = Bis 


i= 09 3 (4:21) 
r 

and jeu (4:2) 
r 


Hence the adjusted estimation procedure will be to choose as estimates the solution of 
equations (4-10), (4-11), (4-12) and (4-13) unless the estimates form a crossover. If f » f» 


assume fj; = V = fj and obtain the adjusted estimates of // and p from equations (4-21) and 
(4:22). 


5. AN EXAMPLE 


The method of solving the likelihood equations, described above, is utilized in the following 
example. 

The data recorded in Tables 1 and 2 are times to failure for ARC-1 VHF communication 
transmitter-receivers of a single commercial airline. Units which failed were removed from 
the aircraft for maintenance. However, in some cases the apparent failures were unconfirmed, 
exhibiting satisfactory operation upon arrival at the maintenance centre. Practical con- 
siderations make it desirable to estimate the fraction of unconfirmed failures in the popula- 
tion. Hence the sample of failures may be subdivided into confirmed failures, shown 
in Table 1, and unconfirmed failures shown in Table 2. The sample was censored at 
T = 630 hours as it was a general policy of the airline to remove units which had operated 
for 630 hours. Histograms plotted for both confirmed failures and unconfirmed failures 
suggest that both subpopulations of failures are exponentially distributed. 
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Considering unconfirmed failures as subpopulation (1) and confirmed failures as sub- 
population (2), the data from Tables 1 and 2 yield the following: 


n= 369, ñ = 107, n= 218, r2 rf, 325, 


h = 03644677. 


n—r = 369—326 = 44, 2 4 03034862, 7 7 


T 


Table 1. Confirmed failures. Hours to failure for ARC-1 VHF radio transmitter receivers* 


16 224 16 | 80 128 | 168 
392 576 128 56 112 | 160 
408 384 256 246 | 184 | 440 
304 16 72 8 88 | 160 
208 | 194 136 224 | 32 504 
256 | 216 168 184 | 144 | 224 
488 | 120 208 32 112 | 288 
60 | 208 440 104 528 384 
360 232 40 112 120 32 
56 72 64 40 480 | 152 
168 168 114 280 128 | 416 
96 536 400 80 40 | 112 
616 224 40 32 192 126 
328 464 448 616 168 | 112 
80 72 56 608 144 | 408 
80 16 424 264 256 | 528 
552 72 184 240 128 40 
272 152 328 480 96 | 296 
72 168 40 152 488 480 
112 288 168 352 160 | 272 
184 264 96 224 592 | 176 
152 208 160 176 72 584 


Table 2. Unconfirmed failures. Hours to failure for ARC-1 VHF radio transmitter receivers* 


368 136 512 136 472 96 144 112 | 104 104 
344 246 72 80 312 24 128 304 16 320 
560 168 120 616 24 | 176 16 24 32 | 232 

32 112 56 184 40 | 256 160 | 456 48 24 
200 72 168 288 112 80 584 368 | 272 208 
144 208 114 480 114 392 120 48 104 272 

64 112 96 64 360 | 136 168 176 | 256 112 
104 272 320 8 | 440 | 224 280 8 56 216 
120 256 104 104 8 | 304 240 88 | 248. | 472 
304 88 200 392 168 72 40 88 | 176 216 
152 184 | 400 424 88 | 152 184 E zx an 


* Data supplied through the courtesy of Dr G. R. Herd, Aeronautical Radio, Incorporated. 


Referring to equations (4:10), (4-11) and (4-12), the estimating equations are 
A= 0-3035 + 0-4112h, (5-1) 
5 0.5663 — 0-2018f, (5:2) 
p = 0-2900+0-1 192k. (5-3) 
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The process of obtaining the iterative solution is simplified by using a table sim 
Table 3. 
The first step is to enter Fig. 1 withZ, = 0-303 and obtain the first estimate of in fg = 
The corresponding value of [3 i = 0-186, can be obtained from the first estimating equ 
(5-1), and then, utilizing To, Hes and Pp, can be easily obtained. These values are sh 
row u = 0 of Table 3. 


Table 3. Record of iterations 


Pau | Pu | Vu g CA 
0-529 | 0-312 | 4-622 0:1779 
5328 -3098 5-024 1660 
8326 3099 5-002 1666 
5330 -3097 5-046 -1654 
The next step is to compute 
1 
glêo) an 


1+ C/ po) exp [1/719 — 1/239] 
and D, = g(E) — = — 0-0081. The value of F which corresponds to the solution of | 
maximum likelihood equations will occur when D = 0. Since D is positive or zero wh 
Ê = O and negative when Ê = 0-186, the solution for Ê must be 0 < Ê < 0-186. Hence the 
of Ê for i = 1 must be less than 0-186. The change in F, dÊ, can now be computed fr 
equation (4-17). 


D (—0-0081) 
dk, = — 0 _ = = — 0-02. 
* rg (dvo/dko) 1-- (01779)? (— 19-04) 
Hence È, = fc d£, = 0-186—0-02 = 0-166, 


Yu = 0.3718, f, = 0-5328, Pp, = 0-3098. 


For all practical purposes, these estimates are the maximum likelihood estimates of 
parameters since D, = 0-0000. A bound on the iteration error can be obtained by cale 
Dfork, = 0-167 and Ê, = 0-165. Since D, = — 0-0004 is negative and D, = 0-0004 is posi 
and the solution for £ is taken as 0-166, then clearly the absolute value of the iterative er 
for Ê is less than 0-001, 


"The estimate of the fraction of unconfirmed failures is p = 0-3098 and their average 
estimated to be 


, = f, T = (0-3718) (630) = 234-2 hours. 
The estimate of the average life of the confirmed failures is 
&, = ĝa T = (0-5328) (630) = 335-7 hours. 


It should be noted that a fairly accurate solution for the estimation equations was obta 
with only one iteration. However, Norton (1956) gives a warning that one or two ite 
on maximum likelihood equations may not be sufficient. Therefore, it would seem de 
to place bounds on the iteration error as was done in this example. All calculations, ine 
those for the two boundary values, were made on a desk calculator in less than half an h 
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An iterative scheme for an automatic computer can be programmed to deliver a much more 
accurate solution in a minute or less time. 

It was previously mentioned that the parameter of primary interest. is p. The estimates of 
the average life of units from the two subpopulations, æ and a, may be useful in anticipating 
maintenance requirements, In any case, this example represents an unusual and interesting 
application of the methods of estimation for mixed. exponentially distributed failure 
populations. 

6. PROPERTIES OF THE ESTIMATES 


Small sample properties of the estimates were obtained by empirical sampling for various 
parameter points, a parameter point being identified as a specific combination of n, A, fa 
and p. Fifty samples were drawn at each parameter point and estimates computed by both 
the maximum likelihood and the adjusted estimation procedures, The means of the para- 
meters, based on N = 50 samples, for the two estimation procedures are presented in 
Table 4. The corresponding estimated variances, computed from the formula 


8 E (6:1) 


are given in Table 5. The symbol A is used to denote the number of crossovers per group of 
N = 50 samples, while the subscript A is used to identify the estimated means and variances 
for the adjusted estimation procedure. The expected value of r;, E(r,), is given also. 

The properties of the estimates were investigated primarily at parameter points where the 
bias and variance of the estimates might be expected to be large. At first glance, a sample 
size of n = 100 may seem large but this view must be tempered by consideration of the test 
termination time F. As T approaches zero, the number of observed failures diminishes until, 
in an extreme case, no failures may be observed. Tn this latter case, obviously very little 
information can be obtained from the sample. The values of E(r,) are a better indication of 
the efficiency of estimation, although the efficiency of estimating /; is approximately 
proportional to E(r,) only when £; is very small. Hence one should consider the relative 
magnitude of æ; and 7’, expressed as the ratio f; = «,/T, as well as the sample size, n, when 
making comparisons. . 

An examination of Tables 4 and 5 reveals that, in general, estimation was poorest, both 
with respect to bias and with respect to variance, for those parameter points at which a large 
number of crossovers occurred. As would be expected the use of the adjusted procedure for 
the crossover cases brings about a substantial improvement. I oe 

For parameter point number 6 for which f, = 0:2, f. = 0-6, p 7 dura a total o 
209 samples was generated. The results are shown in histog Paes hert ; 

The large bu, variances of the estimates can be obtained by inverting the symmetric 
information matrix, I, where 


22701675000 nak eee 


FREQ) BPS fipa 
4 UNO ^ ei. iem] 2608-8 | bes, 
Io, M. p» = 5 [ fi FQ) fipa 
pautna 
Py PY 


y distributed failure time distributions 
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The asymptotic variances, indicated by the symbol c5, along with the estimated variances 
obtained from empirical sampling are given in Table 6 for parameter points 3, 6, 14, 19 and 
22. For parameter points 6 and 19 the agreement between empirical and asymptotic 
variances is remarkably good. In both cases £, and 2, are relatively small. For points 3, 14 
and 22 the empirical variances are much larger than the corresponding asymptotic variances 
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Fig. 2. Histograms of estimates at parameter point 6. N = 209. 


Table 6. A comparison of the asymptotic variances and the estimated variances 
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suggesting that, for those combinations of f, f, and p, n was not sufficiently large for 
asymptotic conditions to hold. 

The effect of test termination time on the efficiency of the estimates is of considerable 
importance. Some light may be shed on this question by examination of the asymptotic 
variance-covariance matrix of @,, , and f, even though it is undoubtedly true that for some 
regions of the parameter space the asymptotic results are quite different from the actual 
finite sample results. 

With time measured in its original units, ti the information matrix of 8,, 8, and f may 
be shown to be 


G(T) k(Y — k) T? DK =k) 


pr (T) 
re 


ajas aipg 
FAT G(T)k(Y — k) T 
Iu, 2,» = % 2 21 ( E i (6-3) 
1 
— p 
z C] 
e a- über p EDT RIS 
o3 FT) a3 F(T) pq 


The variance-covariance matrix of G, @ and f, Ig, a, p) 18 


at opo, ee e 
pE(T) BETA pR 

TG, 8,2) = =p ay G(T) K1-k)T], (64) 
— ieee uum. 
e 


where D = 1—(A+B+C). 
The off-diagonal terms of Ig, approach zero as T' approaches infinity, implying 
uncorrelated estimates of a4, &, and p when the sample is uncensored. Also, 


lim A = lim B = lim C = 0 


To To T>% 
and, consequently, lim D = 1. Obviously F (œ) = Fo) = 1. 
T9 
2 
Therefore, lim c3, ==, (6:5) 
To np 
ai 
lim 03, , (6:6) 
MM a on 
lim of = 4. (67) 
To n 


From an intuitive point of view, the larger the value of T, the greater will be the amount 
of information on o, % and p. In fact, if the sample were not censored and hence T' were 
uld be the binomial estimate, p = 7,/n, and its 


infinitely large, a sufficient statistic for p wo 0 sti 
variance would be pq/n. It is therefore not surprising that, in the limit, o% approaches pa|n. 
(dn), as à function of T is shown in 


The behaviour of the ratio c5 to its limiting value, 
Figs. 3a and 35. In the first figure p — 0-05, in the second p = 0:30. Curves are shown for 
9, = 0-25 and a, = 1-0, with a, = 1-0 in all cases. When œ, = a the curves can be shown to 
be independent of p. 
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The asymptotic variance of the maximum likelihood estimate of the parameter in the 
case of a single exponential distribution, with sampling censored at time T, is 
a? 


0 = Eq: (6:8) 


The information on æ in this case is based, in part, on the knowledge of the number surviving 
the test. In the case of the mixed population, this information is unavailable since only the 
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Fig. 3a. Ratio of the asymptotic variances of and & to their limiting values as functions 
of the test termination time. a, = 0-25, 1-0; a = 1-0; p = 0:05. 
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Fig. 3b. Ratio of the asymptotic variances of f and Â, to their limiting values as functions 
of the test termination time. a = 0-25, 1-0; æ, = 1-0; p — 0-30. 
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sum of the number of survivors from the two subpopulations, (n — r), is known. Therefore, it 
would be reasonable to expect 
03. o (6-9) 

Noting that A, B, and C are always positive, it is obvious that 03, is equal to the product of 
a3/B(r;) and a coefficient which is always greater than or equal to one when D » 0. Hence 
when D > 0, the inequality (6-9) holds. D will be less than or equal to zero only when 7 is very 
small and this case is of little interest from a practical standpoint. 

Figs. 3a and 3b also show the ratio of 03, to its limiting value, a$|(nq). Again a, was 
assumed equal to 1-0 and only the æ, = 0-25 and a, = 1-0 results were plotted. 

The curves in Figs. 3a and 3b could be used to help decide whether or not a given increase 
in termination time, T, will yield enough additional information to off-set the cost of the 
increased testing time. 


7. RELATION TO RESULTS FOR A SINGLE EXPONENTIALLY DISTRIBUTED POPULATION 


The estimating equations for the mixed exponentially distributed subpopulations are ob- 
viously a logical extension of the results for the case of a single exponentially distributed 
failure population. The maximum likelihood estimate of the parameter, &, of a single 


population is 


a= * (total observed life), (7-1) 
where the total observed life is At (n—7)T. (7-2) 
jal 


The observed life for the mixed populations can be divided into three portions, namely 


L^ Ts 
She Cds and (n—r)T. 
jal j=l 


can be allocated to their appropriate subpopulations. 
However, since we do not know how many of the (n—r) non-failures belong to each sub- 
population, we cannot be certain of properly dividing the observed life, represented by 
(n—r) T, between the two subpopulations. It is therefore necessary to estimate the portion 
to be assigned to each subpopulation. The expected number of the (n—7) items Y to 
subpopulation (1) is k(n—r). It therefore would seem reasonable to apportion (n—7) T to 
subpopulation (1) and (1 —£) (n —r) T to subpopulation (2). Since the estimating equations, 
(4-11) and (4-12) can obviously be re-written as 


The first two quantities obviously 


333 | (9) 
1 71 
S -- 


„44 T 
By Ta 


(7-4) 


; : i ed 
we thus see that, just as in the case of the single population, our estimates can be express 


as ‘total life’ divided by number of observed failures. ee 
33 
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8. RESULTS FOR A MORE GENERAL MODEL 
Consider a mixture of s failure subpopulations, mixed in proportion p;:p,:...:p, where 
3 
0x p, €& 1 and X; p, = 1. Unless otherwise indicated, in this section i ranges from i = 1 to 
i=] 


i =s and j ranges, as before, from j = 1 to j = rj. Assume that the subpopulations are 
distributed according to a distribution function F; (1:25, 25, -.., c which is independent of 
Py D» +++, Ps: Then the cumulative distribution function is 


FW) = X pF. (8) 


A random sample of n units is tested to time ¢ = 7’. The number of units, ;, belonging to 
the ith subpopulation and failing before time J, is recorded along with the actual failure 
8 
times, 4%. Let the Dr. = r. Then (n — r) units, which cannot be identified as to subpopulation, 
survive the test. 


Assume that all measurements are in units of size T' and let x = t/T and f; = oT. The 
conditional mixture proportion, defined in § 2, is 


` piu) 
(a) = D: (82 
Pile) g P 
where G(z)-1—RE(z), G = FAE, 0<p;(2)<1, 
8 
PL =1, pi-p(0), iI) = k; 
The likelihood L is 
n! s n Ts Ts 
L= ogg "7 M pe Ae II (eh TIE fe. e3 
It follows that the first partial derivative of In L with respect to p; is 1 
alnL d(nG(1) r r * 
* ——— = (n—r) 4 5 te 
„5 um dp; P. Ps 
= (n—r) i-em. (84) 
t Ps Pi Ds 
Setting the ꝰ ln L/0p, equal to zero and simplifying, 
"i n- n), 84) 
5 ; 


The likelihood equations for P; are thus linear functions of the estimates of the corresponding 
conditional mixture proportions, k; = p,(1), regardless of the distributional form of F(z). | 

It would be desirable to consider a distribution function, He), more general in form than 
the exponential in order to provide a model which will fit a larger set of failure populations. 
The frequent occurrence of exponentially distributed failure populations makes it desirable 
to choose a family of distributions for which the exponential would be a special case. The 
family of distributions represented by the Weibull function, 


F(x) = 1— exp[- (z/ fn], e 


1 
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snd 1 - (SI e [-(2)7]. (87) 


is one possibility since A) is the exponential distribution when m, = 1. Assuming various 
values for the shape parameter, m, provides a wide range of distributional shapes from 


which to choose. 


Let : 
F(x) = 21 F(z), 
where F(x) = 1-exp[-(#/,)™] 
and the values of m; are assumed known. Taking the first partial derivative of In L with 
respect to J; yields " 
on L  m;n—r)k, mr; m 


> 3 — 
op; be Bi poe 


Setting the ? In Li, equal to zero and simplifying, 


Bi m (Stn. 


When s = 2 and m, = m, = 1, the estimating equations reduce to those obtained for the 
case of two exponentials. 

It should not be too difficult to set up an iterative method for the solution of the maximum 
likelihood equations, regardless of the shape parameters, m;, as long as the number of sub- 
populations, s, is small. When s = 2, the estimating equations can be solved for £7", £2" and 
p by the methods given in $4. In general, there will be (2s—1) equations to solve for the 
same number of parameters. The procedure used fors = 2 was to reduce the three equations, 
by substitution, to a single equation 


k = g(k), 


and then determine the value of k such that g(k)— k = 0. Inthe general case, it is possible to 
reduce the (2s — 1) equations to s— equations of the form 


g(K)-k;-0 (21,2,..,5—1) 


where k; = ,) is the conditional mixture proportion for subpopulation ( at time t=T 
and K = (ky, kg, ..., Ti). The iteration method would theninvolve the selection of a vector K 8 
the solution of the (s— 1) simultaneous equations. Ifa unique maximum likelihood solution 
exists, the solution will be a point in a restricted region within a unit cube in the (s— 1) 


dimensional space of K since 3 
0<k;<1 and P =. 
ing the solution by trial and error 


A digital c little difficulty in locat 
gital computer would have ee the equations by iterative N 


when sis small. More efficient procedures for so y 
given in texts on numerical methods. Once K is determined, the estimates of 2; and p; ean 


be obtained from the original maximum likelihood equations. 


9. CONCLUSION i 

: i the 

The maximum likelihood estimation procedure appears to give uen 1 oe : 
sample size n is large and the test termination time, T, is large relative to ci 3. 


and 7' are small, the estimates are badly biased and have large variances. It 2 pm 
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seem desirable to investigate the use of other estimators having better small sample 
properties. 

In most experimental situations the relative magnitude of œ, and æ, will be known. When 
this is true it is possible to modify the estimating procedure in a simple way and thereby 
substantially reduce the bias and variances of the estimates. 


The authors wish to thank Mr J. Durbin and Mr A. Stuart for reading the manuseript 
and for their helpful suggestions. | 
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A BIBLIOGRAPHY ON LIFE TESTING AND RELATED TOPICS 


Bv WILLIAM MENDENHALL 
Research Techniques Unit, London School of Economics and Institute of Statistics, 
North Carolina State College 


The following bibliography covers papers concerned with statistical theory and methods 
applicable to the study of the life characteristics of some biological or physical body. For 
instance, we may desire knowledge concerning the life characteristics of a certain type of 
electronic tube, a type of bacteria, or perhaps of a complex system such as a high-speed 
digital computer. In general, information is obtained either as the result of a planned 
laboratory experiment or through the analysis of data obtained from service use of the 
product. We will loosely describe either of these situations as a life test. In this bibliography 
the major emphasis will be given to the industrial applications of life and fatigue tests with 
related statistical theory, although many of the statistical methods developed for life tests 
of industrial products are applicable to studies of life in other fields. 

One might ask why statistical methods developed for analysing data from other fields of 
experimentation are not applicable in analysing the results of a life test. Why consider 
life testing as a special topic? If a random sample of items drawn from a population is 
tested until all items fail, conventional statistical techniques may be employed although, 
in many cases, the assumption that the underlying frequency distributions are normal 
(Gaussian) will not be satisfied. Failure of the data to satisfy the assumption of normality 
may present difficulties, although many statistical tests have been shown to be robust to 
the failure of the data to satisfy this assumption. A second difficulty encountered in using 
conventional experimental methods in industrial life tests is the expense and time involved 
in waiting for all of the test items to fail. For this reason, many industrial testa ppe oon: 
cluded before all of the test items fail. A sample obtained from such a test is said to be 
censored. By censoring, a given amount of information can be obtained in a shorter time 
at the expense of testing more items. Testing time may also be reduced by the use of an 
accelerated life test in which the sample is subjected to stress conditions excessive to those 
encountered in its normal environment and which reduce the life of the product. Careful 
justification of accelerated life tests requires the experimenter to establish the relation 
between life characteristics under accelerated and normal stress conditions. Maximum 
utilization of test equipment and time saving may also be obtained by the eee 5 
renewal, of a test item by a new item immediately upon failure. AD Sone: n 
advantageous characteristic of life test data is that the observations are ordered i 55 
and thus permit the use of order statistics. That is, the smallest observation is the fir 4 
observed, the second smallest, the next, ete. Therefore, we shall be concerned 9 
statistical theory which applies to non-normal as well as to normal distributions, censo E 
sampli i i . order statistics, and the treatment of data obtain 
pling and sampling with renewal, d i P ith the last 
under accelerated conditions. Very little has been published e m wen 
topic. Other topies of considerable interest in the study of life UD S eae ee 
value theory, fatigue and wear tests, systems reliability, machine productivity p , 


and the treatment of sensitivity data. 
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The references have been classified as belonging to one or more of nine subject. groups. 
They are presented in alphabetical order by author with the appropriate subject classifica- 
tion indicated by capital letters in parentheses at the end of the reference. Some of the 
subject classifications are obviously subtopics of others. For instance, an extreme value is 
an order statistic, but the theory of extreme values is of sufficient importance to warrant 
a separate classification. Similar overlapping occurs with censored sampling, since many 
of the methods used in the treatment of censored samples are based on order statistics. 
These references will be found under censored sampling and will not be repeated under order 
statistics. The choice of papers was governed by their relevancy to the possibility of appli- 
cation to the study of life characteristics. Some of the groups are represented only by a 
few of the more important papers whose bibliographies may be consulted for additional 
references. In general, the main body of Continental literature is represented only by 
papers revealed in the bibliographies of other papers. Papers which do not fall specifically 
into one of the nine groups are unclassified. 

The nine subject groups are as follows: 


C. Censored sampling or sampling from a truncated distribution, both univariate and 
multivariate. 

R. Sampling with renewal. 

O. Order statistics. (Excluding papers on extreme value theory or censored sampling.) 

E. Extreme value theory. 

K. Papers concerned with failure rates and conditional failure density. Also included 
are tests of randomness for a sequence of events occurring in time. 

F. Fatigue testing and wear problems. 

M. Machine productivity problems. This group includes some representative papers con- 
cerned with machine productivity of a group of machines subject to various failure 
laws and servicing arrangements. 

S. System reliability. 

D. 


Methods applicable to sensitivity data. These papers primarily deal with the fitting 
of dosage-mortality eurves although some refer explicitly to the use of these methods 
in the testing of explosives, ete. 


For more extensive bibliographies on order statistics, the reader is referred to * Order 
Statistics’ (1948, by S. S. Wilks, Bull. Amer. Math. Soc. 54, 6) and ‘Bibliography of 
Nonparametric Statistics and Related Topics’ by I. R. Savage (1953, J. Amer. Statist. 
Ass. 48, 844). Papers concerned with machine productivity and the servicing of automatic 
machines are included in ‘A bibliography on the theory of queues’ (1957, by A. Doig; 
Biometrika, 44, 490). Two bibliographies on system reliability which contain both statistical 
and non-statistical papers are A Summary of Reliability Literature’ (1956, by c. G. 
Moore, Jr., Naval Electronics Service Unit, Washington, D.C.) and ‘Literature Guide on 
Failure Control and Reliability’ (1956, by W. F. Luebbert; Technical Report No. 13, 
Stanford Electronic Laboratories, Stanford University). 


The author wishes to thank Dr W. R. Buckland, Dr D. R. Cox, and Dr G. R. Herd, as 
well as other persons for helpful comments and bibliographical material. In particular, 
thanks are due to Prof. M. G. Kendall, Director of the Research Techniques Unit, London 
School of Economies, for direction and support of this work. 
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MISCELLANEA 


A two sample distribution free test for comparing variances 


By BALKRISHNA V. SUKHATME 
Indian Council of Agricultural Research, New Delhi 


Introduction. Two sample tests of a non-parametric nature have been proposed by various auf 
the problem of testing the equality of variances, particularly in papers by Mood (1954), n 
and Sukhatme (1957). They discuss the consistency and power properties of these tests. It h 
shown by Sukhatme (1957) that some of these tests are reasonably efficient for normal alterni 
highly efficient for some non-normal alternatives. 

These non-parametric tests are, however, of limited application in the sense that they pre 
knowledge about the relative location of the two populations which is not always available. In ti 
case, the test can be modified by applying it to the deviations of the observations from th > Sa 
medians rather than to the observations themselves. The modified test is essentially the same 
we would expect it to behave similar to the original test at least for large samples. On examinat 
found (Sukhatme, 1958) that this property is not shared by Mood's test which was thought to 
competitor to the variance ratio F test. A new test satisfying the above property was therefore pro 
by the author (Sukhatme, 1957). The test is based on the statistic 


T H m n 
- mn È K(X, Y;), 


where K(X,Y)=1 if either 0<X<Y, or Y<X<0 
= 0 otherwise. 


I. The proposed S test. Let Xi, Xp ..., Xm and VI, Y,,..., Y, be two samples of independent ol 
tions drawn from populations with cumulative distribution functions F(x) and G(x), respecti 
knowledge is assumed concerning F and G except that they are absolutely continuous and 
differ in the scale parameter only. The test statistic for testing the hypothesis H: = G ag 
alternative A: F +G, may then be defined as ; 


S= Sol, x. 70 FAX, L,Y) +e- È S K , 20, 
t=1j+k i*pj-1 tml jul 
where s=m+n, 
K(u,v)=1 if either O<u<v, or v<u<0 
= 0 otherwise 
and Q(u,v,w) 1 if either O0cucw, O<v<w or w<u<0, w<v<0 


= 0 otherwise. 


We reject the hypothesis if S is either too large or too small. 
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An alternative expression for the teat statistic 


Let r; bo the rank of the ith positive observation on Y in order of magnitude in the combined sample 
of positive observations on X and Y, and A the rank of the sth negative observation on Y in order of 
magnitude in the combined sample of negative observations on X and Y. Further, let 

n’ = number of positive observations on Y, n° N. 
m' = number of positive observations on X, m” « m—m', 
vam +n’, am tn", amen. 


Then it can be shown that SSS. (2) 
T -8x 1 22 
RM X» DII 184. * 3) (c dn ) 
i ? (4 12 
ne -4 € ^m'(a--n* 42a" — 4) | (n 1) (Sa 8n* — 14 
nd 5. * LE 122 +s" 48 (n* +1) (3a & ) 
e 2 41 2 12 
This expression for the statistic S seems to be more convenient. In the subsequent development, we will 
use either of the two expressions as the case may be. 
2. Expectation and variance of S under the hypothesis. 
ES = ES,+ES, 
= E[ELS, |’, m']] + HAUS. | n°, m^] nn 
varS = E[var [S, | n, m']] + Hvar [S; | n*, m^] + var [E(S | n',m T]. (2:2) 


For fixed n' and m', we have 


mo, nits’ f I 
1 1 E x *1) 
c 2 ii 


Wan mE IS V ES, cete ern 
nIm- a PEN * i (23) 
-D TD, pS E OR hne 

12 ij 


n(n — 1) (a +1) ENUT), 
i*j — E 


Also n' and m' are binomial random variables with probability of success equal to half so that 


En“ = In, En’? = Ann), En? = Mn? 3n*), } (24) 
En = p(n + 6n?’ +3n?— 2n), En” = dg(n*-10n* + 16n*— 10n*). 


Using these results, we obtain r (2-5) 
ES = gun ber- 35n — 2n* — 128+ 12n], 
var S = . 120054 48000 — 23442 — 600" + 2690) (2:6) 
x 

3 $ : ider the total number of positive observa- 
3. Expectation and variance of S under the alternative. Consider s : A 15 

tions only. Since r; is the rank of the ith observation on Y in order of ‘magnitude in the combined samp 
of positive observations, we have 


nS itd K(X» Yd, (9-1) 


^ ion havi 
where XI, Xa, ., X. is a random sample of m observations from the truncated population having 


N from the truncated population 
C.D.F. F’(x) = 2F(x)— land V., Ys... Y „, is an ordered sample drawn 5 i ments 
having the cop L. tz) c 20l) L. This isthe basio relation that will be used in computing the mo 


of S. We will illustrate the method by computing 


; 
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We have on squaring and summing over $ 


^ a 


E 64-7 È b K(X, T E b K(X, Y) K(X» Y.). 
-1 


Taking expectation on both sides 
n á - 
E Li- i =m N P(X < Y) - m'(m' — 92 PX. c Y,,X,< Yj} 

i=l i=l -1 


+ - E 7 n! 5 — B. n 7 
=m’ È Té gogo tmr U -G 


n'! 
*m'(m'—1) ps HR Fu 8 ii 
= m'n’ [s P sei Pru. 
0 0 


n’ n m' n^ 'oo 0 
E Lili - = E i K(X, Y) =m’ Y iP(X, c Y) = men n! — nf ro ao am | Fe. dd“, 
i-1 i-1 t=1 i-1 0 0 


[G"(u)]*(1 — G'(u)]" -* G'(u 


Also 


Hence 


w n^ n’ 'oo 00 
E E ri = HEY (r—if-2E Ni-) D 1 m'w f F' dG +m’n'(m’ — 1) T: Fa dd 
i-1 i=1 121 0 


T 2m^n'(n' — 1) 5 Fi dG” F 
0 
v M. = Í e, ($2) 
where the range of integration is from 0 to co. Then for fixed n, m^, we have 
rË i = W+ m^" Mts, (33) 
E E rf = 3m^n' Mio 4- m/?n/ Mio + 9m^n'9 M1, u 4- 1) (2n’ +1), (3:4) 


n’ n’ 
E p TP min Myo + 6m/9n' Mz, 4-m/9n' Mig + 19m/n/9 M: 11 HIMN M, + 3n/n"9 Mis + 2 i, 
i=1 2s 


(3:5) 
E Sr if = 15m/n/ Mio + 25m’ n’ Mio 4- 10m/9n' Miot m/9n' Mio 


i=1 
+ 50m’n'® M1, + 3MM DM, + Ah zi 
+30m‘n’ M1, + 6m/9n/9 M, + Am ist 8 $t, (3-0) 
E É rir; = m ^n'3(3M1, —2M1] mA 993 MA m n Mig + p ij (37), 
E b rrj = m'n'?[18M1,— 12M1;] - m'*n/94 M1, — 3M1, + 5M4] 


+m’On Mio Mig ni * 3M1]-- m'/e(5M;, — 6M14-- 8 MI] 
m/Oy/9 9M; Mts + 4M io) + E itj, 9) 


E É rer? = m'n'$[27 Mt, 22 M(]-- m TC —830M1, ＋ 3211] 
T 2m'0y9 Mi Mii +m ^n'"[9 M1, + Mj,— 47 
T m^ Ona MS + 160 — 1 5M] - m'9n'[9 M1 4- $M bo] + mn’ M8 


+ mean] aut, Min +6 fre derta) f" Fu) as) | 4m/n'93M3 + E 19 
0 < 


recen pati ~3Mia+ 6M jo M +12 [ F(x) d f O aa | „ 
0 
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Replacing prime by double prime, we obtain the corresponding results for functions involving rj, where 
F'(r) = 2F (x) and G*(x) = 2G(z), the range of integration being from — co to 0. Using these resulta we 
have 

ES = Amn{( — Mj --6M1 —4—2M1, 2M3, — Mig — 2M 5) + (Mio — 3M fo + Mio + Mie 2) 


*n(2Mf — 3Mjo+2Mj; Mis 2). (3-10) 
var [S, | n^, m^] m — Mao) 


+m ow ami —8Mio MII- 448 iro arin f’ Paa) 
0 


z 
+ w^ dis — 16 — SMO Mj; - u fro age) F'(u)G'(u) age) | 
0 
mm- MII- AM H- AMIsIT 8%m’2n’} [Mio — M15] -*m/n^][2M1, Mi — 2M i] 
em“ inꝰ:ſ 2 M10 + 2M ig — 2M 1 M 4 MII Mio] + M= — Mio Mio) 
+sm'n’[Mio— 3M + 2M1 2M1 Mio), (3:11) 
var[S, | n",m^] xmn" [Mis — M33 -AM$ — 4 Mis — 4M$o + 4M io Mio) 
4 m^n"i( — 12 M II- 4M7 4 Mf + AMT, + 12M + 8M fi Mf — 4M T6] 
z 
+m" in"? [ ams, —4Mi3+12 i F” (x) dG" (2) f Frau d@"(u) 
-0 
—8M1, M1 ＋ 8M1, — 8M, — 20 M18 + 12M fp Mio + 16M f; Mi su | 
zr 
men“ [ —16M1i 4-24 fro aea) f F"(u) G" (u) d@"(u) 
-0 


+12M% — 84720 M% — 24Mf, — 20M - 12M fS + 24M fs Mio + sarmi | 
+82m”2n” MM f — M13] + s$m^n^*4[2M3, — 2Min -M 1o] 
sm" ?n"[4M1, — 4M] — 6770 ＋ 4M1; M0 t+ 2Mio Mio] 
7 7 z » Are 
em"n^[3M1, — 6M7, — 21d + 3 M2 + 2M Mio] +3m"n 2 A720 - 2M f, — Moo + Mio Mio], 


(3:12) 
var [E[S | n, m’]] & e D - SPU - D QP) +n {(I+ EP -- SPU E 2P)) 
1 4-mn*[312 + AE? - 48P? -A4DI + 16DB--6EI — 16AE 4 24IP +4DE +32EP+8DP) 
nen Id Le 4D* + 48P? + 4EI + 1624 + 6DI—16BD + 241P + 4DE+32DP+8EP}}, (839) 
where 4 = 0 21 L 1, B= 2MII- 2M E I. D= 1 50 — Mint 210 —1 


B= 21-2311 . 24710— 1, 1 Mi- 1, P= iHMio~1)- 
4. Consistency of the S test. We observe that S is a sum of several gener Se E 115 N 
Lehmann’s (unpublished) result on the asymptotic normality of generalized U statistics, it follows 
is asymptotically normally distributed both under the hypothesis and the ahs rege 
Let the critical region under the null hypothesis consist de add mai 


ES-S>to, 
where lim t, — t. 
noo 
Then P(HS—S>t,o | A} = P(E,S-S2kc,) where k= (tne *E,S- ESyo,. 


Using Tchebycheft's inequality and the expressions for E 1,8 and d given above, it follows that 


lim P{HS-S>t,o|4}=1 


m. n 
which is the requirement for consistency. 
5. Asymptotic efficiency of the S test. Putting F(x) = 
we have F 0 ; 
8 = mn(s—2) Bl a (ac) g?(2) da-f xg*(x) dx |. 
00 8-71 —0⁰ Im 
Efficacy of the S test is therefore 


150 [2 É wate) ote) do [^ annee] s 
61 s* -0 = 


G dr) in (3:1) and proceeding as in Mood (1954), 


548 Miscellanea f 
Also the efficacy of the variance ratio F test is $ 
4mn 
(m+n) (f, 1) (52) 


where 


i I (x— Ex dF (x) 
fy E 
[fe — Ex)? ar) | 


Hence, the asymptotic relative efficiency of the & test with respect to the variance ratio F test is given by | 


720 ao 0 ; 2 I 
es = cul (5 — 1) F xg (2) ar | . (53) 


standard normal density function, es, p = 0-69. Ifg(z) = 1, -}<x<} then e, p = 0-80. From Sukhatme 
(1957), the asymptotic relative efficiency of the T' test with respect to F test is given by 


o 0 
en, = 1-1) [f ag'(2)da — J sorta) ae |.. (54) 
0 -0 


From the formula (5-3), it is obvious that depending on g(x), 0 «es, » « oo. In particular if g(z) is the 
It follows that the asymptotic relative efficiency of S test with respect to T' test is given by 


0 0 
60 2/ be gay de Í mg? (a) dæ : 


517 81 E + (5:5) 
f xg?(x) d I ag?(x) da: 
» 0 -o0 
From the above discussion it is seen that the test is reasonably efficient for normal alternatives and 


highly efficient for some non-normal alternatives. The test however presupposes knowledge about the 
location of the two populations which is not always the case. We therefore modify the test by applyingit 
to the deviations of the observations from the sample medians rather than to the observations themselves. 
The modified test statistic may be defined as 


> n m n m 
S=) € Q(X,-X,X,-X,Y,- Y)42Y Q(X,- X, Y,— Y, Y,- Y) 
i=l j+k i+pj=1 
n m 
$ i i) X SKZ- N. , A 
2 foaled 
where X and F are the sample medians. n 
We observe that & is a sum of several modified generalized U statistics as defined in Sukhatme (1958). 
The necessary and sufficient conditions for a modified generalized U statistic to have the same asymp“ 
- totic normal distribution as the original generalized U statistic are stated in a theorem derived by 
Sukhatme (1958). These conditions are satisfied in the present case. It follows that S and & have the 
same asymptotic normal distribution. Hence the test based on & is asymptotically distribution E 
Summary. A two-sample distribution free test has been proposed for comparing variances, 
a general formula derived for its asymptotic efficiency with respect to the variance ratio ^ test. The 
presupposes knowledge about the relative location of the two populations. In case this is not available, 
the test can be suitably modified, and it has been shown that under certain regularity conditions the 
modified test is asymptotically distribution free. 
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The mean difference and the mean deviation of some discontinuous distributions 


Bv T. A. RAMASUBBAN 
London School of Economics and Political Science 


INTRODUCTION 
Although much attention has been paid to the study of the mean difference and the mean deviation of 
the normal distribution, very little seems to be known about these statistics in respect of some of the 


important discontinuous distributions like the hypergeometric, the binomial and the Poisson, ete. This 
paper is an attempt towards this direction. 


The usual definitions of the mean difference A and the mean deviation q (about the mean ) may 
be considered as particular cases of more generalized ones which can be defined as follows: 


A, NIMH üa) 
and 8, = Dli-al Mi, i (1-2) 


where the summations are with respect to all the admissible diserete values of i andj and h(t) refers to the 
discontinuous frequency function at i or j equal to t. Further, associated with A, and 4,, let 


D,- n (1:3) 
2 
ô, : 
and G, = a’ " a 4) 


where jz, is the variance of the distribution. It will be noted that the ratio G, is an extension of the 
Geary's ratio which is equal to G,. = 

In this paper I shall confine my attention to A, and ô, together with the corresponding ratios D, and 
Gi. In a later paper I shall deal with Agr, Asr+ and der for r>0. Consideration of 63, is unnecessary, 
since it is quite obvious that ôs, = Aar, the 2rth moment about the mean. 


2. MEAN DIFFERENCE (A,) 
With A, defined in (1:1) cO 
A= BEL iF MOMI) 
1 


i 
= 222 6-6) NV en 
— XXG-J2M)Ag)- 0. 
ij 


Using expression (2-1), I now proceed to find A, for the binomial, the negative binomial, the Poisson, 
the logarithmic and the geometrie distributions. 


(i) The binomial distribution 
The frequency function of this distribution is 
h(i) = ()re- (i = 0,1, , n) 2 
with ¢+p = 1. 


The mean difference is therefore given by 
fed 
A,=25 X6-5 00 () p'q*-ipiq"i 
i=0j=0 17 NJ 


23 (do) * (i)er Ir (Eyre. (2:3) 


ase Biom. 45 
35 
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Summing now over i, term by term, 


ATC) Ir 46) (2) pe 00) ee 
e) (edes) (25) +9(:"s) (4) 20% (7 
+{() (E (20 G2 (5) e »()) (as | 

-. ER a 


EL i = n—1/m—1\2 
Arc Sey p k ‘) i à Jong tà (" . ) pgr- :] (24) 


i-1 i 
= 9npq(A + B), say. 


It is then not difficult to show that A is the coefficient of (7! in {(q + pt) (q+ pt)", i.e. the coefficient 
of tu in (t pq(1— t)?)"-* and hence that 


^- GC e QC re QS) E 
r E 


Substitution of (2-5) and (2-6) in (2-4) yields, after some further reduction 


scene el C) 


> — a n — 1 (22)! (pq)* 2-7) 
= 2npa X »( i ) 11 G+! 
n-i on S 
= = i 2:8 
20 Y, ( »( (2) en us 


If we expand (2-7) and rearrange, we also find 


Ay = impo [1- ap fg gg a . 


21 21 
= 2npq ,F,( —n-- 1, 3; 2; 4pq), (29) 
Where des DNE ES aby M . 1) (B -- 1) x? 
al”, (a, J; ys x) in TM yo 21 


From the computational point of view, however, relation (2-4) is to be preferred, since the numerical 
^ N a 
values of the terms like b ) p'q"-* which make up this relation are available for different N and ij in 


Biometrika Tables for Statisticians, 1 (1954 edition). Table 1 thus tabulated below for some typical value 
of n and p gives an idea of the behaviour of A, for both small and large values of n and p. 


(ii) The negative binomial distribution 
For this distribution 


Se Lise ied DET p) en 210) 
h(i) — q ina (?) (i= 0,1,2,...) ( 
where q—p = land n » 0. 
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Table 1. Values of A, for the binomial diatrib 


Table 2. Values of D, = 41/1 


4 - iven i the same n and p as in Table 1. They 
Values of D, = A1 where ji = npg are given in Table 2 for the sam LA 1 
will be seen toe limiting value for the normal distribution viz. 2/,/7 = 1-1284 most rapidly 

as n increases when p is in the neighbourhood of 0-5. ENS e 
Omitting the algebraic details which are exactly similar to those for the positive binomial distribution, 
it may be shown that 


sz n+i\ (2i)! TER 
ag 2550 y ( i Ee 1,09" 
= 2npg (n A 1,3; 2; — 4pq). (212) 


i i in (2- nstrating the same 
It will be seen that (2-12) is obtained by changing the sign of n and p in (2-9) demo 
equivalence as holds for the moments of these two distributions. 
(iii) The Poisson distribution 
D 1 H H " b 
The density function being given by ET. i 
on (i 2 0, 1, 2, .. ), 
i 


ESSE 
Boro; AUC a 


o jit i * re *(2 A )] 
=a A Po Bi Aum 
wo A sith Ri EET 
E Enee A 


o Aw m d tt 
gs btt p 


35-2 
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This also follows by proceeding to the limit in (2-4). By considering the limit in (2-8), we have an in- 
teresting alternative form for A, viz. 


LJ 9i Aie 
= — 10 =~, i 
A, = 20 1) (een (2-15) 
Forms (2-14) and (2-15) establish an identity 
LÀ Av A2 o 2i Ain 
-2A 3 2-1 j 
= [X lia wr 50 P # E Dr dug 
If I. (c) (n) is the modified Bessel function of the first kind defined by 
EON 
Tle) = Xr em 
we find for the Poisson distribution, from (2-14) 
A, = 2A (1,(22) + 1,(22)). (2:18) 
2 2i) An 
Also, if Aya zi SIM 
fo) P ) (asm 
E A PU 
dA 2E ) (| il 
and hence from (2-16) and (2-17) 
dj d 
df o E Ae 22) 01 
| = 2e-3AI,4(22), 
giving 
A 
f(A) = 24 e 21. 2A) dA 
0 
a 
and A= 2f 2 (2) dA. (219) 
0 
Still another form for A, is possible. Consider expression (2-15). This can be slightly adjusted to give 
: A, = 2A ,F, (4; 2; — 42). (2-20) 
Again, from (2-19) and (2-20) 
A 
J „ fh GA = A FL; 2; — 42). e 


Table 3 shows the values of A, and D, for different A. A, has been obtained using expression (2:18). 


The values of I,(2A) and J,(2A) have been taken from the British Association Mathematical Tables, 10, 
Bessel Functions Part II. 


Table 3. Values of A, and D, for the Poisson distribution 


As; A, D, A A, 2 
0-05 0-0952 0-4259 1:50 1:3195 1:0774 
0-10 0-1818 0-5750 2-00 1:5430 1-0911 - 
0-15 0-2610 0-6739 3-00 1-9123 1-1040 
0-20 0-3337 0-7461 4:00 2-2206 1:1103 
0-25 0-4007 0-8015 5-00 2.4910 1:1140 
0-50 0:6737 0:9527 7.50 3-0641 1:1188 
1-00 1-0476 1:0476 10-00 3-5458 1:1213 

Ale 
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It will be seen that while differing considerably for small values of A, D, tends to the normal value of 


1-1284 as A increases. 
(iv) The logarithmic distribution 
For the logarithmic distribution, 
n im p 
h(i) = 0 G-12,..) (2-22) 


where € = —I/log,(1—a) and a « 1. 
Thus 


o i 1 ata 
A, * 2 $ = | —————À €À 
z 28 « P üogü ay i j 


which reduces to 


A, = 2/(log(1—a))* (A — B), (2-23) 
where 
Lj i ai 
PERI p 7) (2-24) 
i-i -1J 
. giji 
B 5-5 2 (2-25) 
i=1 + V-1 


Term by term summation by i gives 
A= l (1 a2) B 2 — loj (14a). 
—— — 0 — log 5 = g e 


Substituting these values in (2-23), 
A, = 2/(1—a) {log (1 — æ)}? [— (log (1 — a?) (14-2)9]] (2-26) 
A, and D, for various values of cs in the range 0-1 (0-1) 0-9 are tabulated in Table 4. 


Table 4. A, and D, for the logarithmic distribution 


Jq 0-1 0-2 0-3 0-4 0-5 


| A, 0-1061 0-2194 03506 | 0-5080 | 0-6958 0-9784 1.3885 2-1288 | 4-0856 


D, 0-4341 0.5756 06712 0-7395 | 0-7776 | 0-8231 | 08444 | 0-8515 0-8373 


An interesting feature can be observed in the above Table viz. that D, increases ben a up to a = 0-8 
and decreases at æ = 0-9 suggesting a maximum value for D, in the range 0-8 «a « 0-9. 


(v) The geometric distribution 


In this case, T 
h(i) = p- (i=0, 1,2,...) R e 


Where q 4-p = 1. 
By a straightforward simplification, 
p(-2p | P Y. 95. (2:38) 
= 2 ˙ —— . 
E ba +p Pup) 1-» 
L A, 2p ID 2% (2-29) 


A . pm 
lso p = Ta Dp Jp IT 
80 that lim D, = 1. 


D—1 8 5 en in 
Using relations (2-28) and (2-29), A, and D, have been computed for different p’s and are given 


Table 5. 
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Table 5. Values of A, and D, for the geometric distribution 


A, D, 
* | 
0-9524 | 09035 
1-3333 09428 
1-8750 | 0-9682 
2-7451 0-9843 
4-4444 0-9938 
9-4737 0-9986 
49-4949 0-9999 


The approach of D, to the limiting value of unity as p > 1 is very clear from the above Table. 
3. MEAN DEVIATION (di) 

-I -A =2E G, (91) 

where m is the smallest integer greater than if y is not an integer and equal to /i if it is itself an integer. 

Johnson (1957) has recently obtained the value of 6, (and G1) of the binomial distribution. I have 


therefore considered the other distributions, viz. the hypergeometric, the negative binomial, the Poisson, 
the logarithmic and the geometric distributions. à 


From (1-2) 


(i) The hypergeometric distribution 
The frequency function of this distribution is 


Mi) = CCS 


(6 — 0, 1, n) | 03) 
and the mean jJ = np. 
Therefore from (3-1) 
; à, = 23 6- np) AG). 
The above sum can be easily evaluated, by rudis (i—np) by 
A/N) GUNq —n +1) - (n — 2) (Np — 3) 
() (Npy9(Nq) 
4] NO (Na- i 
a” = a(a—1)(a—2)...(a—r+1). 


2n 
Thus = =D {i(Nq—n+i)-(n- x (”\ (Np) (Na 
ere oae - D g n ef, 


and h(i) as defined in (3-2) by 


where 


2 Ways (eee n (Np) (n — 2) (Np — 3) 
N NM ANG) Fa- Ti (9 (Nq—n+i) | 
L 2(Ng"[. A n-i (Np) n /n - IM (Npyi*» 
ENEN” [36-1 ama i Jac] 
2 (Ng) 3 (Np) 

N NO? \m-1)(Nqg-n+m— 1y" 

Np\ / Na 

a Paice a | 1 (23) 


wee Ke) 
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It is readily verified that as N -» oo, this tends to the value binomial distribution. 
(3-3) has a very — — —-— Suppose 
n'are two integers such that n +n” = N and further let m = 3 iar tnn 


integers. Then 


?, (corresponding to n’) = —— (5) 


(x) 


( Np 0 Ng ) 
Sí Micsip Np-np) \N—n—Np+np 


- 6 


> sca n) 5 2 


00 


2np(Nq—n+ G 
ô, (corresponding to n) = —— ^t AL", since m = np. 
n 


aa 


Lol 
N (*) 


Thus ô, (corresponding to n’) = 6, (corresponding to n). 
Moreover, the variance 44, = npq(N —n)/(N — 1) remains unaltered by substituting n' for n. 
Thus, the values of G, corresponding to n and n' are equal. 


From (3-3), 


(ii) The negative binomial distribution 
For the negative binomial (2-10), 4 = np. Hence 


ao oO 
8, = 23: (i—np) h(i) = 2Y tig- (n+) p) A() 
which on simplification gives m m 


n+m— ') S (3:4) 


4, = 2m S que 


(iii) The Poisson distribution. 


eX (3-5) 


The result for the binomial distribution may easily be shown to tend to this limit as n > co, p > 0 and 


np > 2. 


(iv) The logarithmic distribution 
The frequency function is given by (2-22) and the mean y = cæ/(1 —a). Thus 


Me (5) 10. 
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This may readily be shown to reduce to 
acc 
01 Ia 8A —(l-a™")}, (3-6) 
m-i xí 
where So; T (for mz 2) 
7 
=0 (for m = I). 


(v) The geometric distribution 
The mean of the distribution (2-27) is p/q. 


à 2 (i -?) hlij = 2mp™ (30) 
m q 


Dr Johnson, who was kind enough to read this section on the mean deviation, has pointed out a re- 
markable general result which includes the hypergeometrie, the binomial, the Poisson and the geometric 
distributions, viz. 

6, = 2(variance) (frequency function at m), (3:8) 
where m is the greatest integer not greater than the mean. He also notes that for continuous distributions, 
if m= m, the mean, the relation holds for the normal curve and for a Pearson Type III distribution, but 
fails to hold for the double exponential and for a Pearson Type VII distribution. 

It is hoped to deal with some points arising out of Dr Johnson's suggestion in more detail later. 


Tn conclusion, I should like to express my thanks to Prof. M. G. Kendall for Eis helpful advice and 
encouragement. 
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The mean deviation of the Poisson distribution 


By EDWIN L. CROW 
National Bureau of Standards, Boulder, Colorado 


The recent derivation of the mean deviation of the binomial distribution by Johnson (1957) suggests the 
corresponding derivation for the Poisson distribution. This derivation, in some contrast to that for the 
binomial distribution, is a simple exercise, but the result appears to be new and yields some interesting 
comparisons with the mean deviations of the normal and binomial distributions. 

Let the mean of the Poisson distribution be denoted by m, the largest integer not exceeding m by [m], 
and the mean absolute deviation (from the mean) by MD. Then 


LÀ - 
MD = X -N 
r=0 71 


Im] r eo r 
Se >» rm] 


r=0 r! INI r! 

ER [= m+ A i de mra g mn 2 © = 
r-0 T! poo T! remm) T! re tm)+1 f! 

> PDT cuida (1) 
[m]! 

= mY, e 
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where Y is the maximum value of the Poisson probability. The form (2) would also result from taking the 
limit of the mean deviation of the binomial distribution as derived by Frame (1945), 


MD, = nY na» (3) 


where M D, denotes the mean deviation of the binomial distribution with parameters p, g = 1 — p, and 
n, and Y,., is the maximum value of the terms in the expansion of (q  p)*-*. 
The ratio of the mean deviation to the standard deviation is then 


Rim) 2 mY. (4) 


Both R(m) and MD can be quickly calculated from existing tables (Molina, 1942; Pearson & Hartley, 
1954). The results of such calculation can be summarized as in Fig. 1. It may be confirmed that the 
discontinuity in [m] causes no discontinuity of R(m), but does cause a discontinuity in its slope at integral 
values of m. Let m = k+u, where 0<u<1, k = 0,1,2,.... Then 


dR(m) dR(eu) 2 i36 
an ae (ktu) (4 — u), (5) 
20, O0<u<4; 
* 0; u=4; 
<0, $<u<l 
In particular R'(k+0) =k Y = —R(k-0) (k= 1,2,...). 


Thus R(m) has infinitely many relative minima, at m = k, and infinitely many relative maxima, at 
m = k— 3, k = 1,2,.... The absolute maximum is 0-8578, atm = 3; the absolute minimum zero, at m = 0. 


0-8 
gs 
Ef 06 
0-4 
0-2 
00 1 2 3 4 5 6 7 8 9 10 
m mean 
Fig. 1. Ratio of mean deviation to standard deviation for 1 Meca TEE eee 
distributions. Binomial: p=}, ©; p=}, x (shown only for n<9); n=1, — ; : 
trees (shown only for p< 3); Poisson, ——; normal, 
By applying Stirling's second: term approximation to k! we obtain from (4) 
mu 1 T 6) 
Rk+u) = aim (144) exp| -«- 159 ) |. 


e ( a 5 — [e (1«)] 


= exp ugs (n D) +00) z (7) 
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Hence R(k+u) = (2/7) exp [:«-9-0«00-5 | 
= jm) U Sb 0 e 00 


Furthermore, it follows from the alternating series property in (7) that 


1 
An) exp 2 (u—u?—}) «ou» | « R(k 4-u) « (2/7) exp 2 (u- TO »| l 
where O(k-*) comes from Kk! only, and thus introduces an error in R(k -- u) of less than 1 % for & I. In 
* RIH (jm (k= 1. 2. ., 


1 
R(k+4)> (2/7) exp E + ou) | „er) (k=1,2,...). 
Since R(m) is continuous, R(m) therefore takes on the normal limiting value infinitely often. Equation (5) 
shows that R(m) equals ,/(2/7) exactly once between m = k and m = k+4 and exactly once between 
m = E and m = k+ l, K = 1,2, . ; since R(0) = O and R(4) = 0:8578, this statement also holds for = 0. 
Equation (8) shows that for sufficiently large k the points of equality are at u = } + /(3)/6, and Fig. 1 
indicates that these are correct to one decimal place even for k = 1. 
By (8) ; 1 

R(k) = (2/7) (i — x) +O(k-*), (9) 


R(k+4) = (2m 0 +a) TO. (10) 


Equation (9) is quite accurate for small values of k, to four decimal places for k > 8, and is easily improved 
in the form (6); (10) is less accurate. Since R(k+ I) is given by precisely the right-hand member of (9) 
also, the ratio of the maximum positive difference from ,/(2/7) to the two adjacent maximum negative 
differences is asymptotically 3. The value of 4 for the ratio of the maximum positive difference to the 
mean of the two adjacent maximum negative differences is, by calculation from (4), accurate to two 


figures for k+4> 2-5; it is in fact better than the asymptotic expressions with either one or two further. 


terms for k+4<7-5. 
In addition to the recurring exact equivalence of the Poisson distribution to the normal distribution 
with respect to R(m) there would be difficulty in distinguishing the distributions by use of R(m) due to 
sampling fluctuations. This may be confirmed by examining the percentage points of the distribution of 
the statistic a — (mean deviation)/(standard deviation) for samples from a normal population (Pearson 
& Hartley, Table 34A). The upper 5 % point for sample size n = 36 is 0-8578, which also is the absolute 
maximum of R(m), attained at m = 0-5. The lower 5 96 point for n = 36 is 0-7440, which exceeds R(m) 
only for m<0-21 and 0-98 «1-02. For n = 1001 the upper and lower 5% points are 0-8090 and 
0-7869, which include between them the R(m) maxima for m 3 and the R(m) minima for m> 6. Since 
the Poisson distribution is quite asymmetrical for m < 4, say, and is J-shaped for m < 1, the ratio of mean 
deviation to the standard deviation is a poor criterion for distinguishing the Poisson and normal distribu- 
tions, or for determining whether a Poisson distribution is approximated by the normal. The matter is 
rather academic since there seems little need or likelihood of such application. 
ý The ratio of mean deviation to the standard deviation of the binomial distribution follows a quite 
similar damped oscillation as the mean increases. This can be confirmed by examining J ohnson’s asymp- 
totic formula (4) for R in the manner that the exact formula (4) for the Poisson distribution was examin 
above. When the mean m = np is varied by varying p with n fixed, R varies continuously betwee? 
minima less than ,/(2/7) with discontinuous but finite slopes at integral m and smooth maxima greater 
than /m) at m near 4, , ....n—4. The graph is symmetric about m = 4n. Only minimum values occur in 
Johnson's table. The asymptotic ratio (for large n and p<}) of the maximum positive difference from 
V.) to the absolute values of the adjacent maximum negative differences is 


TE 
l1- pq’ 
which approaches the Poisson value as p approaches zero but is unity for p = 4. The behaviour for small” 


is illustrated in Fig. I by the representative case n = 4 and the extreme case n = 1. Since exact calcula- 
tion confirms for small n the qualitative behaviour indicated by the asymptotic formula for R, we may 
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conclude in particular that for any fixed n there are exactly 2n values of p for which R for the binomial 
distribution equals J(2/m). 

If p is fixed while n varies, then m = p, 2p, ..., and R does not have a t. 
of m. The behaviour can be examined by considering n contin with continuous eih function 
The cases p = 4 and p = } are plotted in Fig. 1. The upper bound of the ratio of ape eceding. E 
standard deviation for any distribution is attained with p = $, n = 1. mean deviation 


It is a pleasure to acknowledge helpful suggestions by M. M. Siddiqui. 
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Note on the characteristic function of a serial-correlation distribution 


By ROY LEIPNIK 
Test Department, US NOTS, China Lake, California 


1. Kendaii (1957) and White (1957) have recently derived formulae for the moments of a two-para- 
meter distribution which has appeared in several studies of serial correlation (Leipnik (1947); Quenouille 
(1948)). In an unpublished paper, the writer found a Neumann-type series for the characteristic function 
of this distribution for a generalized parameter range. It may now be of interest to reproduce this result. 
here, along with compact expressions for the moments derived therefrom. 


2. Let the function f = fy, p be defined for |z| «1,47 0, le|«1by 
C om 


~ rgpra«o 
This is clearly non-negative in the given ranges for a, A, p. The Fouri Lisa y 
be caleulated as follows. i Erd elyi 
The generating function for the orthogonal polynomials c? of Geg iei cio pec) 
LÀ 
(14 t- Spr = X preted 2 
which converges for | p | «1, |æ] & 1. It is known that (Erdelyi) for A>0, 1 0, 
T(2A +n) 


pl =a = Frey 


1 F each . Insertion 
Hence, (2) converges uniformly and absolutely in the region. |a|<1,|P|<Po< 1, for Az0 


of (2) in (1) yields for $ the result 
e r S e eee ede. (3) 
go = ent. Fh rt et Js ; 
elyi et al.) after à suitable change of variable, becomes 


T(2A4 X) „% 4 
J * (L-at hola) etetde TA 5 e n 
-1 


The integral of Gegenbauer (Erd 


for A> 0, S, where J, is the Bessel function of order À 4- k. 
Hence we find 
T(2A +k) 


2A (5) 
k1T(22) 


2^ v (py 
ét = 00 ran X 9) 


a series of the Neumann type (Watson (1944). 


Jy 
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3. Moments of f}, „ can be evaluated by writing J (f) as a series and making an appropriate change 
of summation index. 

We have for A>0, |t| «oo, k>0 
[^] ( EN 1)" (Je ENA 
Trl) = 3 HTANT ETI) (6) 
Since the double series 


T(2A+k) T(2A+k) |3|?" 


o o LJ Ld 
k ror. A rote Hou SE * 
P PELA Ld kim T Ed-m „ 251% BOE) 


m! 
e T(2A+k) 
= 2 t | ® - ME, 
exp?) X | iet no ce) 
converges for all A 0, p, t by the ratio test, we can insert (6) into (5) and rearrange the summation to 


obtain T(A+1) 2 gol T(2A +p — 2m) p» 


$0 = TEN 2, HO” S. TMTP-MT m! (p— 2m)! 


(7) 


Thus for the pth moment of f}, about zero we obtain at once 


, p TAT 1(2A-+p—2m) pr 


Ho = 3» TON „o Tcp ml)mip-3i OP b 


Clearly 4 is an even polynomial in p when p is even, and an odd polynomial when p is odd. The 
polynomials (A, p) are similar to, but more complicated than, the Gegenbauer polynomials eX). 
As A + co, it is easy to verify that 
lim I(2A p — 2m) T(A4 1) 
* T(2A) T(A+p-—m+1) 


= no 2, 


"where ô, is the Kronecker symbol, so that the moments ii tend to p”, the moments of the c.f. y(t) = ete 
corresponding to a unit distribution concentrated at p. 


2 ——— C ͤœö¹ům d — Nn 


Explicitly, we have for p = 0, 1, 2, 3 
k=l, 
pA 
Ih —3X1^ 
zm MASSEN T 1 | 
f= Aa A+)? N 
FERT 
T Qxnoc20«23^ 203x002) 
If we set A = 3n, then for integer n, the moments y} agree with the formulae given by Kendall and 


indicated by White. I have checked this for p = 0, 1, ..., 10. | 


4. The proof of convergence of the standardized distribution to the normal, as A > œ, ean also be 


carried out in terms of the characteristic function. (Kendall's derivation is based on examination of the 
moments.) 


Note first that o? = (/ N), (U) = At f, where 


awo n (5) (9) 


The c.f. d of the standardized distribution is given by 


g [^4 


80 = 96050 = exp(—iAif,t) GJ ras 1), 


S a 1(2A4 E) Alt 
2— A pts 
PX KITA) 0. 
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We now utilize a formula of Neumann for the square of a Bessel function, namely (Watson, 1944) 
("^ S m __(2n+2j-1) 
1 (m 2 
Gic Ze C9. I TU 


Clearly — ji (3)-anm E (- (0 ah (2A+2k+2j-1) 
A 


F(z) = (9) 


varla) (CER H1) nto j-1 (2QA+2k+)) (22 2k 2j)j 
For large A, it follows that 


a =e le. 3 C) 
*k a 
Mets mae) ^ 


- T(A- 1) 
ği) = exp ( - 1 5) TEN ` 


5 y T(2A 4- k) i) 
e at Pines ren be: oh 
Ait 
= exp (0t ca) n (mz 2 dere )). a) 
À 
where ,7 is the confluent hypergeometric function. Consider now the function 


Y, (z) = exp (- 2iàt0z— 82?) ,F, (2A, A + 1, ilz). 
From the differential equation ay" ＋ (f —2)y'—ay = 0 of y(x) = Fila, Hz ), it follows that Y,(z) 
satisfies the differential equation 


zY,” , (Atl 4óz? 
dA «nC wu tt Der Ax) 


Hence 


Q8 


ex [ 20420- «(zi 28 app 203042 aan), 42840—1)2 + J 0 . (12) 


7 dA 
Tf we choose PEERS é= den : ii — 1, we obtain 
Arr 
2¥% ARA Shek i= 28 , 282 N! #4) <0 a 
T (S MI U- A) HAN N a j 


If we let A > co formally, we find the limiting equation F — 9 Y „ = constant. Since Y(0) = 1 foreach 
A, we have Fo) = 1. 


More rigorously, we can expand Y, () in the series Y, (z) ŠI + Xr (z) and show that the 


sequence { TV)) is such that TA) = 1+ O A= for fixed z. Thus we have 
ipa) (dpt b ( a J- 19 0 . 
m (2a ix Or) -ev(zacn* x«i) Ya 


LB Ps SED d 4). M 
„ -C ter- beh. en 


Note that the coefficient of ¢ is 


Ap UA ) z pi U 0 
EU ae Bi) = are 
1 1 
and that (riß pP- 0 -1+0(;). 
ense we E dt) = exp - 48) (1-00), (10) 


The technique of asymptotic 
which proves that the standardized distribution tends to the us da 555 tiie "puero 


differential equations employed above may be useful in other such pro 
under the name ‘W.K.B. method’. 
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Two further applications of a model for binary regression 


By D. R. COX 
Birkbeck College, University of London 


1. Introduction. In a recent paper (Cox, 1958), I have discussed some aspects of a logistic model for 
analysing regression when the dependent variable can take only two values, say 0 and 1. In the present 
note two further applications are presented of what is essentially the same model. The first is to the analy- 
sis of 2 x 2contingency tables based on matched pairs, and the second is to the testing of the agreement 
between an observed binary sequence and a corresponding sequence of probabilities. 

2. The 2x 2 contingency table with matched pairs. Consider the form taken by a simple comparison of 
matched pairs when the observations are (0, 1) variables. Let there be n pairs of individuals, the pairing 
usually being such that the two individuals in any one pair tend to be alike. Let one member of each pair 
belong to group A, the other to group B, the assignment being randomized if a comparative experiment 
is involved. An observation, taking one of two values 0 and 1, is made on each individual. For the ith 
pair, let these be represented by random variables Y;a, Vi. The possible observations on à pair, writing 
that on A first, are (O, 0), (0, I), (1, 0) and (1, I). 

Tt is possible to form a 2 x 2 contingency table from the data 


Group A Group B 


n n 


MeNemar (1947) seems to have been the first to point out that the usual y? significance test for such 
a table is invalid, because it ignores the correlation induced by pairing. He recommended that the signifi- 
cance of the difference between A and B should be tested by rejecting the pairs (0,0) and (1, 1), and by 
examining whether the proportion of (1, 0)'s among the remaining ‘mixed’ observations (0, 1) and (1,0) 
is consistent with binomial variation with chance 4. Mosteller (1952) and Cochran (1950) have given 
further accounts of this test and Cochran has discussed extensions to the comparison of more than two 
groups. Stuart (1957) has recently obtained a test equivalent to MeNemar's by arguments based on the 
theory of stratified sampling. i 

This work raises two problems. Are there circumstances under which the test is optimum, and is there 
a corresponding estimation procedure? To deal with these questions we must set up a pa rametrie mi el 
covering the non-null case. The simplest such model seems to be the following. Let all random variables 
be mutually independent and let there be a parameter A, characteristic of the ith pair and a parameter 
describing the true difference between A and B, such that 


Pr(Y,, = 1/Pr(Y,, = 0) = Ay 9 
Pr(Y, = 1)/Pr(Y, = 0) = VA, ; H 


If we write À; = ei, yr = el, we have the logistic model of the earlier paper. 
It follows by the arguments of that paper, in particular of § 4-5, that the jointly sufficient set of 


ely , PN 27 : 
statistics consists of n Y, (ii) the pair totals (V. Y), i = 1,...,n. Further, optimum inference 


about V, regarding A, ..., A, as unknown nuisance parameters, is based on the distribution of (i) condi- 
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tionally on the set (ii). Now whenever Y;a + Y 4+ 1, the contribution of the ith pair to (i) is fixed. Hence, 
the conditional distribution just mentioned is equivalent to that of R= number of pairs (0,1) condi- 
tionally on the observed value of M = number of pairs (0, 1) or (1,0). 

Now a simple calculation from (1) and (2) shows that 


Pr(Y,, = 0, Yo = 1| Yast Ya = 1) = W/L +Y) = 0, say. (3) 
Therefore R, conditionally on the observed value of M, has a binomial distribution 


Pr(R=r|M=m)= (") eu 8. (4) 


In particular the optimum test of the null hypothesis i» = 1, 0 = is McNemar's test, and confidence 
intervals for Ó and hence for ¶ are obtained in the usual way for a binomial parameter. The significance 
test can be looked on as the very special ease of Haldane & Smith's (1948) test for a serial order effect 
obtained when each series contains just two items. 

Example. Mosteller (1952) illustrated the test on an experiment in which each of 100 subjects used 
both of two drugs A and B, the response being a dichotomy *not-nausea', ‘nausea’ (0 and 1, say). 81 
subjects never had nausea, i.e. gave the observation (0, 0), 9 subjects gave (1, 0), i.e. had nausea with A 
but not with B, 1 subject gave (0, I) and 9 gave (1, 1). The significance test of the null hypothesis that the 


1/300 and 4/5. at, 
Tests and interval estimates comparing the values of yin different experiments can be done by familiar 
techniques for binomial variates. 
3. Test of agreement between a sequence and a set of probabilities. Let Tu, Y, be mutually indepen- 
dent random variables each taking the values (0, I) and let Pr , Pn be agiven set of numbers, 0<p;< 1. 
Suppose that it is required to use observations on Vi..., Y, to test the hypothesis that 


Pr (T. = I) b (i = I. e ). (5) 


^ ; i be the proba- 
For example, a weather forecaster might put forward each day a number purporting to 
bility that it will rain the following day. It might then be required to test whether the observed occur- 
rences of rain are consistent with these probabilities. 
If n is large, we may group the trials into sets each with nearly constant p;; then 5 
tion of 1’s in each set can be compared with the corresponding p; Let n be too small for 3 ^ 5 
One method of deriving a small sample test, When special alternatives to (5) agens te ur b. 
consider a family of probabilities derived from (5). This family is characterized by a continuous par 


de log rg Y, = /r Y, = 0} = log hdd. (6) 


zv Iff> 1, the suggested probabilities p, show the right general 


1, the suggested probabilities vary too much. 
err are the complements of the true 


The null hypothesis (5) corresponds to / 
pattern of variation, but do not vary enou 
If % 0, the p, vary in the wrong direction and if % = — 1. the p, 
probabilities. E 

The log likelihood under (6) of an observed series yi. -::» Yn 15 


VN log p., (I- log (1—p,) - Elog(pi = (- 50%). (7) 


Hence, the sufficient statistic is obtained by scoring 
_ flog (20) when Yi— 1 (8) 
‘= \log ld -o when V. 0. 
I expee itive 
and by considering a total score X = E X; The factor S is included to make the 3 


and to arrange that an event of probability } scores 9. 
Under the null hypothesis % = 1, 9 
EX) = nlog24Epiogpet 2 (1 —pillog(t— Pi, x 

10 
V(X) = Ep(722 log pK. e 


564 Miscellanea 


Provided that n is not very small and that none of the p; is near 0 or 1, the distribution of X is nearly 
normal. 

In principle it would be possible to caleulate confidence intervals for f from an observed value X = z. 
If x significantly exceeds (9), this is evidence that f 1. 

Example. Suppose that there are 16 trials, 8 of which have outeome 1 and 8 have outcome 0. Let the 
p, corresponding to the zero observations be 0-1, 0-1, 0-2, 0-2, 0-4, 0-5, 0-6, 0-7, and corresponding to the 
unit observations 0-3, 0-3, 0-5, 0-6, 0-6, 0-8, 0-9, 0-9. 

Thus the score for the first observation recorded as 0 is log [2(1 — p;)] = log 1-8 = 0-255, and the score 
for the first observation recorded as 1 is log (2p,) = log 0-6 = — 0-222. We find that the total observed 
score x = 1-106 and that under the hypothesis % = 1, equations (9) and (10) give 


E,(X) = 1.030, V(X) = 0-785, 
so that there is excellent agreement with expectation. Under the hypothesis f = 0, i.e. that 1’s occur 
randomly with constant chance 4, we find 
E(X) = nlog 2+ 42 log[p,(1—p,)] = — 1:329, 
V(X) = 42 {log [p,/(1—p,)]}? = 1:314. 


The observed value differs significantly from A) at the 5% level. Thus the data support the idea that 
T's do not occur with constant chance } and are in excellent agreement with the suggested probabilities. 


The family (6), on which the test just described is based, is especially appropriate when the sequence 


{pà} is known to be correct at and near p = but possibly incorrectly spread around p = $. Thus we may 
call the test based on (9) and (10) a test for spread. A natural generalization is to replace (6) by 


log (Pr, (Y, = 1)/Prg (Y, = 0)} = logipi(1 —p0) +% (11) 


the null hypothesis being that / = 1, a = 0. The pair of sufficient statistics are X, as defined previously, 
and Y = XY,. Under the null hypothesis, X, Y are nearly jointly normally distributed with the mean 
and variance of X given by (9) and (10) and with 

E(Y)- Ep, Vi(¥) = N-. (12) 
OX, Y) = Ep, - loglp,[(1—p,)]- (13) 

Note that if the p; are symmetrically arranged about 4, X and Y are uncorrelated. 
A test for bias ignoring spread will be based on Y alone, i.e. solely on the observed total number of I's. 
If both bias and spread are of interest, it is necessary to specify the relative importance to be attached to 


each, if an optimum small-sample procedure is to be found. Since it is rarely possible to do this, a sensible 
practical approach is to find the observed values x and y and to see whether 
(w—H,(X),y—E,(Y)) ae eX, aR B. 
OX, Y) V\(Y) y— Ey Y) 

is significantly large in the X? distribution with 2 degrees of freedom. The expression (14) is, except for 
a factor 4, the exponent in the bivariate normal distribution of X and Y; it is the likelihood ratio statistic 
for testing the hypothesis that X, Y have the bivariate normal distribution (9), (10), (12) and (13), against 
the hypothesis that X, Y have arbitrary means, but the same covariance matrix as under the null 
hypothesis, This, of course, does not allow for the fact that the covariance matrix varies in à determined 
way with the parameters a and H. However, the determination of the correct likelihood ratio criterion 
requires the maximum likelihood estimation of æ and H, which is tedious. 

Krampl; Consider the data that were analysed previously. We have that the observed value of Y is 
Y 8 and that E,(Y) = 77, V,(Y) = 2-930, C,(X, Y) = —0-090. Therefore, the observed value of 

as well as that of X, agrees well with its expectation under the suggested scheme of probabilities and 
the need for a combined test hardly arises. The formal details of such a test are that 


(1-106 — 1-030, REN 0:785  —0-090) t /1-106—1-030 (15) 
— 0-090 ko) ( 8-7-7 ) 


is to be tested as y* with 2 degrees of freedom. The value of expression (15) is 0:01: a value smaller than this 
would arise by chance only about 1 in 100 times. 


(14) 


There are further problems connected with the general situation discussed here. First, the same set 
of observations can be consistent with several alternative sequences of probabilities and it may be 
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required to consider which sequence is preferable. Tt seems reasonable to prefer that sequence of proba- 
bilities for which the information in Shannon's sense is a minimum, for this implies minimum uncertainty 
concerning the outcome of the realized sequence. According to (9), this amounts to preferring the proba- 
bilities for which E,(X) is a minimum. Secondly, it happens in some applications that the probabilities 
p, are not given, but have to be estimated from data by fitting a particular type of model, often to the 
same data with which goodness of fit is to be tested. In such cases, the most satisfactory test of goodness 
of fit is likely to be obtained by fitting a model containing additional parameters and testing estimates of 
the additional parameters for significance from zero. The approach of the present section is relevant only 
when there are available no special forms of alternative specific to the problem. 
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A note on a series solution of a problem in estimation* 


By IRWIN GUTTMAN 
University of Alberta and Princeton University 


1. INTRODUCTION AND SUMMARY 


If (() is a sufficient statistic for the family of probability functions { 27 | Held] defined over the real line, 
and if f(x) is an unbiased estimator of a real valued function of the parameter, say 900), then it is well 
known that the function hit) = EX) | 


is an unbiased estimator of g(@), and that it has smaller variance and risk (for strictly convex loss func- 
tions) than f(x), unless of course f(x) = HU) almost everywhere { 7). Further, if tis also a complete 
statistic, then A(t) is the unique Uniformly Minimum Variance (UMV) unbiased estimate of g(0). 

The above holds for continuous and discrete probability functions . We discuss here the case where 
g are discrete probability distribution functions defined on the real line, with probability densities 
P(x), where x = 0,1,2, .... at ; Ad 

Under certain regularity conditions given in $2, a method of determining A(t), without We 
unbiased estimators f(x) of g(0) at all, is given. This has the feature, then, of avoiding the a os = z 
conditional expectations. The method also allows for a solution of a problem raised by one „Mos 5 m 
& Savage (1946). 'This is discussed in $3, where some examples are given to illustrate t) 90 à exper 99 ke 
Tt is interesting to note that a special case of the method has been used by Lehmann & Scheffé ( 


prove completeness of some statistics. 


* Prepared in connexion with research sponsored by the Office of Naval Research. 
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2. THE METHOD 


Using the notation of $1, we now state the following: 
Theorem: If X) is a sufficient statistic, assuming only non 
polt) = m) kG (¢=0,1,2,---) 
depending on a single parameter 6 taking values in an interval containing the origin, then there exists an 
essentially unique unbiased estimate of g(0) with uniformly minimum variance lif, and only if, 
G(0) = g(8)/m() is analytic at 0 = 0, with a power series expansion G(0) = X a,0' such that a, = 0 forall 
t for which k, = 0. 


Proof: Sufficiency. We prove that 


-negative integer values,* with probabilities 


. alke (k, + 0), 
f, arbitrary (k, = 9) 


all 0. To prove it has uniformly minimum variance we need only show t is complete. 
I, = 0 may be excluded. Putting a, — 0 we see that f, = 0 is the only unbiased estimate of zero, proving 


the result. 
Necessity. If f, is an unbiased estimate of 900) 
Df (9) xD. = g0), 


t 
or XZa,0! = G(0) 
with a, = fik Since 0 includes the origin G(0) is analytic at 0 = 0, and the rest follows as in the sufficiency, 
part of the proof. 


3. SOME EXAMPLES 


(a) The negative binomial. Suppose that items of a manufactured process are such that each item 
produced is an independent trial with probability Q of being non-defective, and P = 1—Q of being 
defective. Suppose the process is subjected to an inspection in n stages, (n + 1), the inspecting in anyone 
stage continuing as long as the trials show non-defective items, and stops as soon as & defective turns up. 
Tf we let X,+ 1 be the number of independent trials needed to get a defective in the ith stage, then 


Palt) = PQ”, e 
where x, is the number of non-defectives in the ith stage. It is well known that T' = X X,is& sufficient 
i=1 
and complete statistic for the parameter Q. The density of T is 1 


pot) = uie ? PQ. 


Bupr ono an unbiased estimate of g(Q) = 1—@ is wanted, based on the above inspection p 
Then, it is quickly verified that, in the notation of the theorem, provided n> 1 


+t-—2 x 
a=(" ) zu ! 
t t 


and hence h(t) = (n—1)/(n+t—1). The fact of h(t) being unbiased was known to 1 


rocedure. 


Ialdane (1945). Because 


If CX) -à Xx where the X, are n independent observations from a distribution of the expon! 
type, v the condition pj(t) = m(0) k,Ó' holds automatically. For if the density is of the fot 
fi )) Kr) (where 0 may be written as e?) then t= EX, is sufficient (easily seen by the N 
Criterion; see Fraser (1957, p. 20)) and has density 


which we denote by m(0) k,Ó'. [U/(0)]" . 
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it is a function of the complete and sufficient statistic, it is the unique UMV unbiased estimate of the 
parameter (1—Q). 

The above estimate of P was also known to Girshick et al. (1946). In their paper, a method was given 
which allowed estimates of P¥Q* to be determined, providing that (z, y) were integral and non 
That is, their method would not allow for an unbiased estimate of say, the variance of X, where X has the 
density (3-1), that is, of Q 
(= 
Our method does, for it is again very easy to see that 


nt n+t—1 
MD (C "s ( t ) 
(n+t)t 
(un 
This is the unique UMV unbiased estimate of var (X). 
(b) The Poisson distribution. Let X have the Poisson distribution with parameter A, that is 


Ae 
P alx) = = * 


9% 


obtaining A(t) = 


n 
If a sample of n independent observationsis taken on X, then it is well known that T = $; X, is a sufi- 
i=l 


cient and complete statistic for A, and it has the Poisson distribution with parameter nA. Suppose we 
wish to find an unbiased estimate of the probability that X = 0 or 1, that is, of 


g(A) = e~A(1 +A). 


: (n 1m n—1 bs. 
Then aT (Urs and k= 
* t 
and so we have A(t) = 550 (1445). 
n n—1 


This is, of course, the unique UMV unbiased estimate of the probability that X will be zero or one, 


Thanks are due to Prof. F. J. Anscombe, whose valuable discussion and NE 
to acknowledge. 
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On Nair's transformation of the correlation coefficient 


By MUNUSWAMY SANKARAN 
Presidency College, Madras 


ris well known to many readers. Recently Harley 
studied some further properties of sin-! r due 
suggested by U. S. Nair. It is proposed here 
studied by Pillai (1946), and also for an 
— pr) which, like r, ranges between — 1 


Fisher’s z-transformation of the correlation coefficient 
(1956), while disproving a conjecture of Hotelling (1953), 
to W. E. Sheppard. Little is known, however, of a statistic a 
to give some new results for the distribution of Nair’s statistic, 
allied statistic. The statistic suggested by Nair is z — 6 90 
and +1. 

The results that emerge out of the present study are: 

(i) z and an angular transformation of x, namely sin! v, 
and sin-! « is more nearly normal than c. 


are both asymptotically normally distributed 
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(i) The first and second approximations of sin- z are as good as the corresponding 


(iii) Applying the C.F. form of the Edgeworth expansion tosin- x for the evaluation of the 
integral of r it is seen that for n = 50,p = 0-9 it is as accurate asz, but for n = 25. = it i 
Although it is not clear that the transformation sin! x has any practical advantage over 
formation, indeed it is not appropriate in some situations where z is useful, nevertheless, 
results are felt to be of sufficient interest to put on record. 


Normaurry OF NAIR'S STATISTIC 


That the statistic z = (r—p)/(1— pr) is asymptotically normally distributed can be es' 
the help of a result of Cramér's (1951, Ex. 23, p. 259). To find the rapidity of convergence 
wo evaluate the first four moments. For this, we expand the various powers of x, namely z, 
in the form of series of powers of (r— p), retaining terms up to (r—p)* and take the ex 
the expected values of (r - for h = 1 to 6 are given by Hotelling (1953), the moments ofz 
without difficulty. The following are the first four moments: 


F EL id 
M(Ts.—t*sm-IP' dón-1? ^' 
1 3 


. 


tile) I Au-, 80-15 
„a3 2174.0 
Ma) = 30 I/ s(n- E A 
, 3 .-12-99 
Bao ipt Sa 
From the above the first four cumulants are obtained and are 


SIT pp  p—2%*+3p* 
K(x) = 32in-1) D am- 17 16-15 T 


x 


— 
— cp 
m2) 1 x 3a-IP ^ 


K(x) = . 


6-19 "Ey 


K(x) = Ls 


(n—1)8 
Let us now consider the angular transformation of æ given by 


y= sina = sins (775). 
1—pr 


"The moments of y are given by 


npa P . 99.2, 
BW) = Ani) Stu- Det em- D* 


＋ 


act 4—p*  16-4-3p*— 9p* 
mly) — E? 
May = 3p 9p + 6p* 

MW) = ao Rt S- If 

H — 3 8 — 3p? 

MWY) = FI Au- jr 
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Hence, the curulante of sin! 2 am given by 
wA , M * 
Vy) UM i Me MEE 


MER NEL 
ly) m — iy“ ifai t 


-- IL 
Ky) = M es 
Ky) = -pipt 
Tho above expressions for the oumulanta of sists eaa ke compared A W 
given by 
= log! #?4 2 hatte 
«,(2) i^e175* $5 -T) att] 
- LL 1, „ er 
«,{z) last mir t a M 
** = ctt 
1 34- p 
kde) "ep EST — 


Comparison of the 3rd and 4th cumulants show that y is more nearly normal than 2. 


PROBABILITY INTEGRAL OF f 


Harley, in one of her recent papers (1954) on the probability integral of r, applied the Edgeworth expan- 
sion to the z-transformation of Fisher, because of the inadequacy of the z-transformation for values of p 
near unity and because the exact values are not available in David's tables between n = 25 and n = 50. 

We shall also apply the Cornish-Fisher form of the Edgeworth expansion to y = ein r for n = 25, 
P = 0-9 and n = 50, p = 0-9. Also, we shall use two other approximations, called the first and second 
approximations of y. The first approximation of y assumes that y is normal with mean zero and variance 
l/(n — 2). The second approximation assumes that y is normal with mean and variance given by (4). 


Table 1. Comparison of exact and approximate values of the probability integral of r. 
Case” = 25, p = 0-6 


O-47515 


0-6 047500 0-50000 
1 71182 -19299 4 
75 E F 

89052 90531 pelo 


-99406 
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Tables 1 and 2 give a comparison of the exact values with the various approximations of y and z. The 
conclusion reached is that the first and second approximations of y are as good as the coi ing 
approximations of z and that the Edgeworth expansion of y is as successful as the corresponding expan. — 
sion of z for n = 50, p = 0-9, but not so good as z for n = 25, p = 0-9. f 


Table 2 


From z 


Approxi- Values from Approxi- | Values from 
mation II |theEdgeworth| mation II the Edgeworth 
| expansion of y 


expansion ot z 


(a) Case n = 25, p= 0-9 


0-00698 0-00742 0-00785 0-007456 
05589 05578 05526 05570 
09981 09862 09740 09847 
22576 22379 22196 22359 


0-46250 0-46247 0-46239 0-46244 
78442 78661 78850 78663 
94609 94612 94620 94606 
99325 99264 99197 99256 


(b) Case n = 50, p — 0-9 


0-01285 0-01265 0-01286 0-01307 0-01288 
05998 -06024 06001 05975 05999 
23202 23294 -23200 -23119 23204 
47403 47405 47404 47402 47404 


0.77108 0.77009 077112 0.77202 0.77108 
88871 88814 88872 88928 88872 
99174 99204 99174 -99144 :99172 


CONFIDENCE INTERVAL FOR p 


The confidence interval can be obtained from sinx. Assuming that sin-!((r — p)/(1 — pr) is normal vim 
mean zero and variance 1/(n— 2), let 1 — æ be a preassigned confidence coefficient chosen sufficiently high. 


Tf a is such that 
Pl- inst. P. — — 
| assin (F=2) vim Bca) =1 a, 
0 
where a satisfies (270 1 | elt = Ja, then the above is equivalent to 
a 
r-d r+d \) 
p epee le he un 
[sosta ie 


where d = sin {a/y (n — 2)). 
The following example is taken from David's Tables of Correlation Coefficient 
(1938). / 
The width ofspan (x) and length of forearm (y) of twenty males have been measured and the correlation 
between the two variates is found to be 0-55. Assuming that width of span and length of forearm are 
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approximately normally distributed, what interval will cover the correlation coefficient between z and 


y in the population? 
From David's charts I and II for a = 0-05, 0-215 <p < 0-755 and for x = 0-025, 0-140 < p < 0-790. Then 
using (A) and z with variance 1/(m — 3) we get 
a 
0-05 pe y: 0:217 <p 0-708, 
from z: 0-216 € p « 0-769, 


from y: 0-138 € p « 0-800, 
from z: 0-142 <p<0-798. 


vos ( 


In conclusion, it is a pleasure to acknowledge the help I have received from my studenta in the prepara- 
tion of the tables. Particular mention must be made of Mr G. Balakrishnan, Mr R. Raman, Mr S. Rajago- 
palan, Mr V. Sivakumaran and Mr S. R. Srinivasavaradan. I also wish to express my sincere thanks to 
Prof. E. S. Pearson for his helpful comments on earlier versions of the paper. 
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Short proof of Miss Harley's theorem on the correlation coefficient - 


Bv H. E. DANIELS, University of Birmingham 
AND M. G. KENDALL, Research Techniques Unit, London School of Economics 


1. In considering the correlation coefficient r based on a sample of n values from a bivariate normal 
population with correlation parameter p, Hotelling (1953) discussed the question: does there exist 
a function e), independent of n, such that By(r) = H] for all n? He showed that for functions ex- 
pressible as a Taylor series such a function could only be of the inverse sine type. Misled by an alge- 
braical slip he also concluded that even this kind of function failed to provide an answer. The error was 


noted by Harley (1956) who proved that for even n 

E(sin-!r) = sin-! p. (1) 
Later (1957) she proved that this holds also for odd n. Her demonstrations are rather long and . 
and the following not only has the merit of being much shorter, but exhibits the basic reason for t 


existence of the relationship (1). 


2. If a pair of values (21, Y1)» (ca: Ya) are chosen at random from a bivariate normal population there is 


said to be a concordance of type 1 if z, —2, and y; — y, have the same sign (cf. Kendall, 1955); or equiva- 
ee sgn (ei) sen (yj, — 9) = 1 (2) 
The probability that this is so is (2/7) sin“ p (Kendall, 1955). Consider now a sample of n values (x, y) 


i from this sample. Let C 
from the population and a randomly chosen subset of two, say (ti 31); (ta Ya)» f s 
represent a oer N of type I in the subset. Since r can be regarded as sufficient for p in the subclass 


of scale-invariant statistics, we have 
ven = PO|p)= rio. 
* = EP(C |r). (3) 
To establish (1) we have only to prove that 
P(O |r) = (2/7) sin! r. 


No generality is lost if we measure & and y from the sample means. This done, we have for 7 


b * J. ^ (4) 
r= 2x = cos, say. 
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For fixed r the vectors x and y in the sample space are at a fixed angle 0 to each other and otherwise 
randomly orientated in the space Xr, = O, whatever the value of p. Consider E(sgn(z, —x_)sgn(y, — ys) |r}. 
sgn (x, —2,)8gn (y, — ya) is + Lif x, y are both on the same side of the hyperplane x, — 7, = 0, and I in 
the opposite case. For any fixed orientation of the two-dimensional plane spanned by (x, y) the 
hyperplane x,—2x, = 0 cuts it in a line, and the probability that this line lies between x and y is 
On = cosr/m = p, say. This is therefore the probability, for fixed r, that x and y are on opposite sides 
of z, — £, = 0 for random orientations of (x, y). Hence 


E(sgn (z, An) sg (yy ) |] = —p- (172) = 1—2p 
= 1-(2[m)cos-!r = (2/7) sin r 
and the result (1) follows. 


3. An immediate consequence is that for any subsample n’ «n from (2,, Y1) . (En Yn) the rank correla- 
tion coefficient t^ (the sample value of 7) is such that 


E(t’ |r) = (2/7) sin? r. (5) 
If we let n tend to infinity we find the familiar result 
E(t’ | p) = (2/m) sin p. (6) 
It also follows, on taking n’ = n, that the regression of t on r is given by 
E(t|7) = (2/m) sin r (7) 
whatever the value of p. 
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Runs in a ring 


By D. E. BARTON ax» F. N. DAVID 
University College London 


Some time ago when we were working on the distribution and properties of runs of multiple colours in 
a line we worked out the corresponding theory for runs in a ring, but were discouraged from publication 
by the criticisms that essentially no new mathematical points were involved and that there was no 
obvious statistical application, We do not agree with these criticisms, and the recent paper by Dawson & 
Good (1958) indicates that there is some interest in the runs in a ring problem. Accordingly, we give 
here à summary of results and some tables. 

T identical beads of k colours are supposed in a random order. The problem of the ring has two facets. 
First, the ring may be supposed to have been built up by sampling randomly from a finite population 
of beads, the beads being strung on a thread when selected and the ends of the thread tied after r beads. 
Secondly, a handful of beads r}, ra, ...,r, (Er, = 7) can be imagined as placed in a circle in which case 
symmetries will need to be allowed for. 

The first problem where the ring is the line bent to a circle can be solved either from first principles 
Whitworth (1886) gives a method—or by adopting the method for the line. If there are T runs in the line, 
i.e. T — 1 alternations of colour, there will be T — 1 runs in the ring if the same colour is at the beginning 
and the end of the line, and T' otherwise. The appropriate multinomial term and the number of permuta- 
tions are given for r = 2,3,...,12 and two, three and four colours in Table 1. If S - r— the m 
factorial moment of S is just r/(r— m) times the mth factorial moment of the same statistic calculated for 
the line (Barton & David (1957)). The same limits, the normal and Poisson, hold under the same con®” 
tions, and the positive binomial with the correct first two moments is again a suitable approximat 


| 


Miscellanea 573 


function. Given that the beads are not random in the ring, but that the probabilities are those of a simple 
Markoff chain, the distribution of T under this hypothesis may be derived on precisely — 
out by David (1947) for the two-colour-line case. s = iid Sen 


Table la. (Two colours.) Ring permutations in repeated sampling 
(Probabilities are obtained by dividing the number of permutations by the appropriate multinomial 


term.) 


| Multinomial 

term 

z 2 (1) 2 

3 3 (21) 3 

$ 4 (31) 4 

6 4 (2) X 

> 5 (41) 5 

10 5 (32) 5 5 

6 6 (51) 6 

15 6 (42) 6 9 

20 6 (35) &. ag ‘3 

7 3 7 (61) 7 

21 a 7 (52) 7 M 

35 7 (43) SH 1 

8 8 (71) 8 

28 8 (62) 8 30 

56 8 (53) 8 32 16 
70 8 (42) 8 36 24 2 

9 9 (81) 9 

36 9 (72) 9 27 

84 9 (63) 9 45 30 
126 9 (54) 9 54 64 9 
10 10 (91) 10 

45 10 (82) 10 35 
120 10 (73) 10 60 50 
210 10 (64) 10 75 100 25 
252 10 (5) 10 80 120 40 2 
11 11 (10, 1) 1 

55 11 (92) Tie ae 
165 11 (83) mA n 
330 1 (74) 1 99 165 55 
462 1 (65) 11 110 220 110 11 
12 12 (11, 1) 12 

66 12 (10, 2) 12 54 
220 12 (93) 12 96 112 
495 12 (84) 12 126 252 105 

12 144 360 240 36 
12 
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term.) 


11,550 


132 
660 
1,980 
2,970 
3,960 
7,920 
5,544 
13,860 
18,480 
16,632 
27,720 
34,650 


QU! du uuo 00000 --1-1-320o205c0 OCCUR 


Partition 
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Table 1b. (Three colours.) Ring permutations in repeated sampling 
(Probabilities are obtained by dividing the number of permutations by the appropriate multinomial 


5 6 7 8 9 10 
10 

24 6 

36 24 

42 21 

56 28 14 

70 70 28 

64 48 

96 72 48 8 

112 144 96 12 

128 176 144 56 

90 90 

144 144 108 36 

162 252 216 54 

162 162 162 54 18 

198 333 378 225 54 

216 396 486 378 132 

120 150 

200 250 200 100 

220 400 400 150 

240 300 360 180 80 10 
280 550 760 580 240 20 
320 700 1,100 1,130 680 180 
300 600 900 780 380 100 
154 231 

264 396 330 220 

286 594 660 330 

330 498 660 440 220 55 
374 836 1,320 1,210 660 110 
352 528 792 528 352 88 
418 957 1,716 1,848 1,276 517 
440 1,100 2,046 2.508 2,024 880 
462 1,188 2,310 3,058 2,002 1,408 
192 336 

336 588 504 420 

360 840 -1,008 630 

432 756 1,080 900 480 180 
480 1,200 2,088 2.220 1,440 360 
480 840 1,440 1,200 960 360 
552 1,416 2,880 3.630 3,120 1,620 
576 1,608 3,384 4.740 4,640 2,640 
576 1,488 3,168 4,176 3,840 2,304 
624 1,812 4,104 6,396 7,008 5,076 
648 1,944 4,536 7,506 8,712 6,912 


Miscellanea 575 
Table 1c. (Four colours.) Ring permutations in repeated sampling 
(Probabilities are obtained by dividing the number of permutations by the appropriate multinomial 


term.) 


| Runs 


Multi- | ! 
nomial | r Partition —— — 
T" „ 7 8 9 10 11 12 
— — — — 
| 
24 4 (0) 24 
60 5 (21?) 30 30 
120 6 (313) 36 72 12 
180 6 (213) | 36. 72 72 
210 1 (41) 42 126 42 
420 7 (321) 42 126 182 70 
630 7 (21) 42 126 252 210 
| 
336 8 51) 48 192 96 
| 840 8 (4213) | 48 192 336 240 24 
1,120 8 (3312) | 48 192 416 320 144 
1,680 8 (3221) 48 192 496 640 304 
2,520 8 2: 48 192 576 960 744 
504 9 (61) 54 270 180 
1.512 9 (521) 54 270 540 540 108 
2.520 9 (431) 54 270 720 810 540 126 
3.780 9 (4221) 54 270 810 1,350 1.080 216 
| 5,040 9 (3? 21) 54 270 900 1,620 1,530 666 
7,560 9 (393) 54 270 990 2,160 2,700 1,386 
| 720 10 (713) | 60 360 300 
| 2,520 10 (6212) | 60 360 800 1,000 300 
5,040 10 (5312) 60 360 1,100 1,600 1,320 560 40 
7,560 10 (5221) 60 360 1,200 2,400 2,520 960 60 
6,300 10 (4212) 60 360 1,200 1,800 1,800 840 240 
12,600 10 (4321) 60 360 1,400 3,100 4.050 2,840 790 
| 18,900 10 (423) 60 360 1,500 3,900 6,300 5,340 1,440 
| 16,800 10 (331) 60 360 1,500 3,000 5,100 4440 1,740 
| 25,200 10 (33 23) 60 360 1,600 4,400 7,700 7.640 3,440 
990 11 (813) 66 462 462 


3,960 11 721° 66 462 1,122 1,650 660 
9,240 11 sity 66 462 1,562 2,750 2,640 1 
13,860 ll (622 1) 66 Fi Lo d hos ^ 
13,860 541* 66 46 , „ » 
27,720 11 6321 66 462 2,02 5,170 8,272 7,612 3,652 484 
| 41,580 11 (52%) 66 462 2,112 6,270 12,012 re 
34,650 11 (4221) | 66 462 2,12 5,610 9,570 Hes UH 
40.200 11 | (4981) | 66 462 2,222 6,980 11,550 22604 16.258 5038 
(432%) | 66 402 2,332 7,480 16,060 22,004 15, > 


69,300 

92,400 H (3° 2) 66 462 2,442 8,250 18,810 27,654 24,618 10,098 
1,320 12 913) 72 576 672 

5,940 12 21 72 576 1,512 2,520 1,260 120 


0 3.300 
15,840 12 7312 72 576 2,112 4,320 4,68 , 
CEN a) T3 me ama e ow o 
27.720 : 72 576 2, f 7 , , , 
SEM Eo ee im e miw i ame ie 
83,160 62% | 72 576 2 : : , 
33,204 12 T 1) 72 576 2,502 5,760 8,928 8,004 E zd f 
83,160 12 | (5421) | 72 576 2,952 9,000 18,324 23,616 28080 14,016 2544 
110.880 12 | (5331) 2 576 3,072 10,080 21,624 30816 40,584 24,768 4308 
166,320 12 | (532) |72 576 3,102 19800 24.120 36708 36,144 21,168 5,760 
138,600 12 (4231) | 72 576 3,192 10, , 54.288 59,184 36,288 10,440 


40 31,500 
277.200 2 | dem nsns 'OG0 66,528 80,784 58,008 18,420 
277,5 2 576 3432 13,320 36,060 66, i : 
30500 | 12 ut. 72 576 3,562 14,400 41,040 80,448 107,424 88,128 33,960 
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The second problem discussed partially by Jablonski (1892) is probably the one which Whitworth 
really had in mind. Jablonski imagined r; (i = 1, 2, ..., I) beads of k different colours set down ina ring. 
He enumerated the total number of different arrangements which could be made of these beads allowing 
for rotations and symmetries but not for turning the ring over. Except where there are common factors 
among (r;) the total number of arrangements will be 


k 
(r—1)!/ III. 
i=l 


Further, the distribution of the number of runs will be the same as the distribution of the number of runs 
in the linear problem with the common factor r cancelled out. We show here that Jablonski’s method 
for the enumeration of different arrangements in the ring may be shortened and extended to give the 
distribution of runs also. For clarity we will refer to these runs as Jablonski runs. 

Consider any given arrangement in the ring which we shall call a ring permutation, A. There will be 
r linear arrangements of A say A,, A, , A,, which we may obtain by cutting the ring at the r possible 
points. The set of these, which we will call S(A) consists of the r cyclic permutations of any one of them 
and we will let S(A) contain just m; different linear permutations. If d; is one of the divisors of the 
highest common factor h, say, of ri, Tg ..., Tp, then Mm; = di. Suppose there are n divisors and let 


dic. d.—A 
Further, let A, be that member of S A) consisting of the juxtaposition of d; similar arrays of m; beads with 
ri/di, raſdi, , rd. 
of the respective colours. If Q,,. d,/r is the number of ring permutations which give linear arrays with just 
m, different line permutations, then 


E (r/d;)! E Ae 
aiia 0, 7 i491. e = Tae 60 eum 


These linear equations for the {Q,} have a unique solution so that the total number of Ji ablonski 
arrangements is 1 
J2-Xd.Q, 
T dih 
Let gd) denote Euler's g. funetion, i.e. if d is a positive integer, Gd) will denote the number of positive 
integers not exceeding d which are relatively prime to d. We have that 


3 dj) T, = ; d, = = =r. 
Pu % DX 94 PL 279 rJ 


Now let Q,(!) d/r be the number of those ring permutations with linear arrangements containing just r|d 
line permutations which have ¢ ring runs; let A“ be such a ring permutation and 41 a corresponding 
line permutation. Ai consists of d similar line arrays each of which is an unrolling of a ring permutation of 
7|d beads of colour composition (r,/d, ...,7,/d) and this ring permutation has t/d ring runs. If pat) is the 
proportion of permutations of elements (ri/d,, , U with r/d; ring runs where repetitions are allowed, 
then 

ee T ait) 


and J(t) = Y 
(t) z 940 m PU Ta palt) 


is the number of ring runs with repeated ring permutations not allowed. 
A worked example will illustrate the method given in the previous section. Suppose r = 12 and we 
have three colours (6, 4, 2). It is seen that h = 2 so that 


Oi = 13,860, O. = 60 


and J = 1150+10 = 1160. 


Or alternatively since 
6010 2 12 602) 
we have J = 35(13860 + 60) = 1160. 
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Further, T. p,(t)/r takes the values 2, 9, 46, 118, 240, 302}, 260, 135, 40, 2} as t runs through the 
integers 3, 4, 5,..., 12, whilst T,p,(t)/r takes the values 1, 14, 2, } at the values 6, 8, 10 and 12 and is zero 
elsewhere. Thus J(t) takes the values 2, 9, 46, 119, 240, 304, 260, 137, 40, 3 with a total of 1160 (= J) as 
expected. Values of J(t) are given in Table 2 for those partitions of r< 12, where the highest common 
factor of the parts is 2 or more than 2. Table 3 gives the values of Euler's ¢-function necessary for the 
enumeration of ring-runs up to and including r = 12. 

We may, if we choose, regard the different ring permutations as forming a fundamental probability 
set. The jth moment of the distribution of runs is given by 


i= E20 iJ. 


Table 2a. Jablonski runs. (T'wo-colours.) 
(Only those distributions are given which differ from those of Table 1 divided by r.) 


Partition 


6 

188 : 16 

128 10 22 41 40 16 

318 10 30 61 90 79 38 1 
z g 8% gi se = 54 

1160 E 2 9 46 119 240 304 260 137 40 3 

1542 12 2 9 48 134 282 395 388 220 60 4 
2 9 54 163 378 627 726 579 288 70 
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Table 2c. Jablonski runs. (Four-colours.) 


Partition 


120 96 
36 150 390 633 534 147 
780 1,698 2,260 1,536 
48 276 1,020 2,628 4,524 4,938 
48 296 1,200 3,420 6,704 8,952 


AAARAH 
& 
t2 
wo 
i 


d 
(d) 1 1 2 2 4 


Simple formulae do not flow from this expression but we note that if M, is the mean number of runs when 
repetitions are allowed then 1 


i M, ~ /d 0 
= — LS T, a 
m= M+ EC er. 
It is seen that i42 M, 
with equality if, and only if, the highest common factor of (r,, ..., y) is unity. 
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Some applications of Meijer-G functions to distribution problems in statistics 


Bv D. G. KABE 
Karnatak University, Dharwar (India) 
1. Although the Fourier transform is recognized to bea powerful tool in statistical distribution bv 
the Mellin transform seems to have been negleeted. Epstein (1948) has used Mellin transforms to deri 


certain univariate distribution functions and Nair (1939) has indicated their applications to e» 
multivariate problems. The Mellin transform of the frequency function f(x), (0« 2 « oo), of a ran 


variable X, is defined to be 1) 
g(s) = freno dx, ( 

e 0 

the inverse transform being 


da 
Tr) = zs - x~ g(a) da. e 


———— 
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Obviously the transform g(s) is the moment y; ., of X; in particular, if this is given by 
» n T (b, s) T(a,4 1) 


E Ba 7 M po ro TO 41) e 
then f(x) is given by 5 " 
NE (a;+1) 1 n T(b,-- 8) 
Je) = Hn nsi) ulna ce 
A Fal 1 f m Tiba) ui 


I QO If mi J jo l,, 
However, the last integral represents the Meijer-G function 


41, 4 32j i 
[ER 
Erdelyi et al. (1953) have studied these functions extensively and it is possible to apply some of their 
results to obtain explicit expressions for f(x) in such cases. 

Now it is known that the moments gi. of many statistics occurring in multivariate analysis can be 
expressed in terms of '-funetions as in (3). Nair (1939) has considered some of these statistics and proved 
that their frequency function f(a) appears as the solution of the differential equation 


n d n d 
“fanned feds ° 


However, it is known that the Meijer-G function satisfies the above differential equation (Erdelyi et al. 
(1953), p. 210, equation 1). Nair proceeds to solve this in the special cases n = l and n = 2. In the next 
section we shall consider one of Nair's examples and obtain his results by making use of known explicit. 


expressions for the G functions. 
2. Consider the distribution of the generalized correlation ratio U, whose (s— 1)th moment is (Wilks, 


1932, p. 484) ^ n Ti(n—j) r TIA -- 
2o Tin) h rug. (6) 
so = 15655, Fü 


From (4) we have that the frequency function of U is given by 


a 


n. Tinj) ji re (7) 
= LL. Goa U * 
KO) 7 ll FN bib. b. 
where a, = }(n—2—j) and b; = 30 25) = 2. n). 


We consider the following special values of n. E 
(1) n = 1. From Erdelyi et al. (1953, p. 208, equation 5), we find that 


n 2 [a,—b,—1 $ 

UNE a MP eio n 
1 
so that from (7) and (8) we have the frequency function of U as 
Ti-) — piw-»(1-Uy)e-»4 (0«U«1) (9) 
KU g- THEN 
(2) n = 2. In this case we have 

T(n- T2029 (ao 15 0 „ i; (10) 

HU) = iu 30 40, K — 404 


But, from Erdelyi et al. (1953, p. 209, equation 10) we find that 

30 — 40,09 — 4)+ H = 2-5-1 a(o | PEA 
on(u 100-40, 01 998 ` ul 
= 2-7-1 [T(n — p) Ui- (1—4 U)", (11) 


and using (8), the frequency function of U is 
1T(n-2) die- Y- (0<U<1). (12) 


— — 


AU) FD FEN 
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3. As a further application of Meijer-G functions, let us consider the following problem. Let p, and Ps 
be the canonical correlations between two sets of normal variables (X,, X:) and (Fi, Y}, ., ¥,), (p>), 
and let the variables Q and Z be defined by 


pip, Z= (1—pi)(1-p?). 


Now if q and z are the sample values of Q and Z, and f(q,z) their joint frequency function, then from 
a result of Girshik's (1939) we obtain 


'eo oO 
ata = n F 4 zh (q, a) dq dz (13) 
ES T(n—1) I'(p-5—2)T(n—p— 3-2) 10 
~ T(p-1) TG =I) T(n+s—4-+ 2t) (14) 
= g(s, t) (say). 


However, (13) is the double Mellin transform of the function f(q, z); it is known that the inverse transform 
is given by 


to doo . > ^ 
a2) = gras |. f 0b r eder | as) 


T(n—1) i? T(n—p-3-21)2 
FP. BI) FI r- Dres- Ben DX -»r [a [^ 4% Tu T4 - 20 ajo 


Now the integral within the square brackets in the last expression is the Meijer-G function 


icu (a Mac X : TTA Du ars, ’ 
From (8) we have then 
T(n—1) 7 7 io T(p— L q T 
709,2) = MUCH ech DIG-p-1j^ 2-9 (1— Jz)» a Seen er) ds 
na I(n—1) vies = p= ,) 
e ee eee a —2)|p—2 
1 I(n-1) zin-»-9 g»-, (16) 


~ 20(p—1)T(n—p—1) 
a result which agrees with that given by Girshick (1939). 


The author wishes to thank Mr N. U. Prabhu of Karnatak University, Dharwar, for many helpful 
suggestions. 
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A note on ‘Further contributions to multivariate confidence bounds’ 
Biometrika 44 (1957), pp. 399-410 


By S. N. ROY AND R. GNANADESIKAN 
Institute of Statistics, University of North Carolina 


In sections 46 and 4c of the paper, dealing with the general linear hypothesis, starting from (4-15), the 
confidence statement (4:19), and similar statements involving the truncation of columns of M are 
obtained. The truncation procedure described is essentially equivalent to a truncation of variates and 
enables one to start with a u-variate problem and study all the associated (u—1)-variate problems, 
(u — 2)-variate problems, and so on until we get down to the u univariate problems. In this procedure, 
therefore, a total of (2“—1) confidence statements are obtained. 

There is, however, another type of truncation which is very often of great interest to us and especially 
in the univariate general linear hypothesis problem where we do not have the M matrix. To illustrate this 
problem let us consider the problem of testing the equality of several treatment effecta in an ordinary 
ANOVA problem. With v treatments, the null hypothesis may be stated as H,:t, = t = ... be, If Hy is 
rejected, then a question of some interest usually is what could one say about subseta of the treatment 
effects. For example, what can be said about Hy:t, = tz = t,? Similar problems are of interest in the 
multivariate situation also. For example, in the illustrative example of section 51, if the six schools 
had turned out to be significantly different, i.e. Hy:€, = E, = ... = & is rejected, then this is possibly 
caused by one or two of the six schools being different from the rest. So, for instance, we might be 
interested in studying departures from null hypotheses like H,:5, = E; = §3. / 

This second type of problem can easily be seen to be a problem of truncating the rows of the C matrix. 
This can be done by writing b* = Ob in (4-15), so that we have for our starting point, instead of (4:15), 
the statement 


a* X*' A(A14,)- Cib* — (a*'S*a*)i [sc,(u, s, n —)]* 
«a*]* b*«a* X* 4,(414,)1 Cj b* + (a*’S*a)} [sc,(u, &, n —r)]! 


for aii non-null a*'s and all non-null b*'s satisfying b*[C,(414,)7* Ci) b* = 1. Taking suitable elements 
of b* equal to zero, we can now obtain the desired truncations on the rows of the matrix. Thus we 
have, in fact, not merely the (2"— 1) statements derived in the paper, but (2-1)x (2—1) confidence 
statements which include the original (2"— 1) statements and also others obtained by truncating the 
rows of the C matrix. The joint confidence coefficient is, of course, > (1—2), for a preassigned a. 


Selection of the population with the lar$est mean when comparisons 
can be made only in pairs 


Bv RITA J. MAURICE 
University College London 


]. INTRODUCTION 

It has been pointed out by Bechhofer (1954, 1957) that when 5 vs pro einen 
be ranked on the basis of their means, experimental designs which'e i Le anim the oce 
and &o reduce the underlying variance of the experiment serve 5 ate standard experi- 
analysis of variance. Thus, if comparisons can be made only in pairs CIN k populations the 
mental design is an incomplete block design with two plots per kom [ 2 o a ORA 
simplest balanced design requires Jk(k— 1) blocks, 80 that — pop wine eal require a large 
block with each of the other k—1 populations. However, if k were late f each t; MM might be 
number of blocks and some balanced design involving less F 


emplo: yed The whi to zero over 0 lication of the experiment 
y be constants hich add Zi over one rep. 
. block effects ma; col 5 i f 


i is assumes no! 1 : 
Em Fish co 980 groups (the ‘best’ and the ‘other’ populations) an alternative 


iliari t urnaments, might 
procedure of successive elimination, following the procedure 8 fins 45 ^ 
37 
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be used if k and the number of best populations were equal to a power of two. Initially the k populations 
would be paired at random and a comparison made within blocks. The 3k populations with the larger 
means would then be paired again at random and similar comparisons made resulting in q populations 
to be further compared. This procedure would be continued until the number of populations remaini 
was equal to the number to be selected. If only one population is to be selected one replication of this 
procedure requires k — 1 blocks. 

The simplest cases for which these two methods can be compared is for k = 4 and the selection of one 
of these as the best. For this case one replication of the fully balanced design requires only six blocks and 
it seems reasonable to assume that it would be used. For the cup-tie procedure, a multiple of three blocks 
would be required. 

If it is desired to detect a difference ô between the largest and second largest means with probability at 
least P, it must be assumed that the third and fourth means are equal to the second largest (Bechhofer 
(1954); Somerville (1954)). The most unfavourable configuration of the population means is therefore 
0,—6 = 0, = 0, = 0,, where 0, is the largest mean. 


2. FOUR POPULATIONS: BLOCK EFFECTS CONSTANT 


A fully balanced incomplete block design for four populations requires six blocks for each replication, 
each population being tested three times in one replication. If the effects of a block on the result are 
constant the model is of the form 


ty = 0. LB. z. (i = 1, 2, 3, 4; ) =1,2,...,6), 
6 
where p = 0, E(z) = 0, E(z*) . For this model the estimate of 0, is given by 


ti =} Eryt - 3 (42,4) Y) Dems} 
7 7 $ jm 


(Cochran & Cox (1950), p. 264). i' here indicates the population tested in the same block as population 7. 
The difference between two population means, 0, — 0, is therefore estimated by 


t-t- Visa Na le Nan 9v 8X rue) 33 y t) 
j 
which reduces to tj-t- EE Cy tr) Cu 2n) 
j 


and is therefore based entirely upon within block comparisons. 

The variance of this estimate of the difference between two means is o, and the covariance of two such 
estimates t; — ty t;—tis 30. If there aren replications of the experiment the variance becomes /. Thus 
9 = (t;—t)) Vnſo has a variance-covariance matrix. E 


(EAN 
T 
LUI 


The probability of a correct choice of the population with the largest mean is given by Hi 


Pr {4 > max ta tyt} = Pr(t, -4> OE 2,3,4)) = PB Yn /e, d /njo,9 ö. 
where F(a, a, 


AO ?) is the incomplete normal trivariate integral with the variance-covariance matrix given 


D 


ee I lL eee 
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Table 1. Number of replications and cost of experiments, (i) Incomplete blocks procedure 
F(a, 2, z) = P ndt/o? | (Cost) % 
0-99 7-209 86-51 
0-95 4.252 51-02 
0-90 3-005 36-06 
0-80 1-792 21-50 
0-70 1-115 13-38 
0-60 0-665 7.98 
0-50 0-350 4-20 
0-40 0-136 1-63 
0-30 0-017 0-20 
0-25 0-000 00 | 


FYa') =P 

0-99 13-261 

0-95 7-640 45-84 
0-90 5-328 31-97 
0-80 3-127 18.76 
0-70 1-924 11-54 
0-60 1137 6-82 
0-50 | 0-594 3-56 
0-40 0.229 1-37 
0-30 0-029 017 
0-25 


'To make a general comparison of the performance of the two methods involves comparison of the 
probabilities of a correct choice when the same amount of money is spent on sampling in each case. This 
requires that if n, is the number of replications of the balanced blocks, n, (the number of replications for 
the eup-tie) equals twice n,. The probabilities of a correct decision then become F(x, , ) and {F(x)}*. If 
the cup-tie method is always cheaper then 

F(x, a, x) <{F(x)}? for positive x. 


Or, writing both sides in an alternative form (Paulson (1952), E(F*(t4-4(2)2)) < LECF(t + N 
where t is a unit normal variate. In neither form has the inequality been proved to hold. This seems to 
indicate that, if the inequality does hold for all positive x, the difference between the two probabilities is 
small for at least some values of a. * 

A further comparison of the two procedures may be made by using Wald's minimax procedure (1950) 
instead of P and ô, to determine the number of replications in each case. Assuming that the loss involved 
in making an incorrect choice is proportional to the difference between the chosen and the largest mean, 


the loss function may be written 4 
E(loss) = N È (6 — 0,) pit 2bne. 


i i i ulation 7. 
N here represents the scale of use of the chosen population and p, is the chance of choosing pop i 
It is also Ra as above, that the cost of sampling, ¢, is the same for each plot in 25 blocks, es ee $ 
is the number of blocks in one replication. The expected loss is a maximum when d e AUT e 
most unfavourable arrangement (Somerville, 1954) and its expression then simplifies 


E(loss) = NO(1—p,) + 2bne. 
37-2 
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For the incomplete blocks procedure the expected loss is 
N60(1— F(0 njo, Ino, 0 \n/o)} + 12nc. 


When @ takes its most unfavourable value this function is at a maximum with respect to H. This oceursat 
6 = (0-878) . The expected loss for this value of 0 is equal to 


No(0:3274)/ An 4- 12nc 


(Somerville, 1954). Minimizing this maximum with respect to gives» = Noc 3(0-0571). The maximum 
expected loss for this value of n is equal to Nfctct(2-055) and is divided in the ratio 2:1 between the 
costs of a wrong decision and the cost of sampling. When there is no restriction on the comparison of the 
populations the maximum expected loss for the minimax n is Nictct(1-795). 

The loss function for the cup-tie procedure is 


0 4n 
= — F? E 6nc. 
Bios) = ei n (rar 

This function is at à maximum with respect to @ when O./n/(42m) = 0-8178 and is then equal to 
N 4(2)0(0-3032)/./n + 6nc. Minimizing this maximum with respect to n gives n = N103c-3(0-1085). The 
maximum expected loss for this value of n is again divided in the ratio 2:1 between the cost of wrong 
decisions and the cost of sampling and is equal to Nêctct(1-953). The maximum expected loss is reduced 
by Nicici(0-103) as compared with the maximum expected loss for the incomplete blocks procedure, but 
is Nictct(0-158) greater than when there are no experimental restrictions. 

A comparison of the expected loss of the two procedures is given in Table 3 assuming the minimax n is 
used in each case. The values of Vie were chosen so as to make possible the use of Table 3-1 given 
by Somerville (1954) for the probability of a correct choice using the incomplete blocks procedure. The 
table again shows that the cup-tie procedure is more economical than the balanced incomplete blocks 
procedure for these values of 0. A comparison with the unrestricted minimax procedure is made in Fig. I. 


Table 3. Expected loss of minimax experimental procedures in units of Ntotct 


| 
Balanced Balanced x ; 
ONig-ic-t incomplete Papius NV ie! incomplete Mu 
blocks procedure| Procedure blocks procedure | P" ks 
0-0 0-69 | 065 5-02 1-92 d 
0-42 0-98 | ^ 0-95 5-44 1:84 1-71 
0-84 125 1.21 5-86 1-75 1:62 
1-26 148 143 6-28 1-65 1.52 
1.67 1-67 1-61 6-70 1-55 142 
2.09 1-82 1-76 711 1-44 138 
2-51 1-93 186 7.53 35 1:28 
2-93 2-01 | 1:92 7-95 Te 1.15 | 
3-35 2-05 1-95 8.37 1-17 1.07 
3277 2-05 | 1-95 10-46 0-86 0-81 
4:19 2-04 | 1-92 12-56 0-73 0:69 
4-60 1-99 1086 
— 


3. Four POPULATIONS: BLOCK EFFECT A RANDOM VARIABLE 


TALE inier the block effects constant, it may be more realistic and make possible conclusions of 
ind idity, if the block effects are assumed to vary and to be distributed normally with variance 7 
independently of the residual values. The model then becomes 


yy = Oit bjt zip 


25 


2 * — 
ò h ò 


Expected loss in units of Ni a! c! 


e 
a 


1 2 3 4 5 6 7 8 9 10 11 
Ng elg 


Fig. 1. Expected loss of restricted and unrestricted minimax procedures, (1) Balanced incomplete 
blocks procedure. (2) Cup-tie procedure. (3) Unrestricted procedure. 


where E(b) = 0, E(b*) = o2, E(z) = 0 and E(z) = . Assuming that cj also is known, the estimate of 0, 
for this model is given by 


6 
„„ 
— i j=im 
30? 40% 
(Cochran & Cox (1950), p. 266) and the estimate of 0, — 0, by 
et Yogi Bayt OE nu e) = AD ue) 


t= 


1. — 1 2 — — 2 — 

30? ＋ 40$ 
The variance of the difference is 20°(20? + 7*)/(3?-- 471) = “ and the correlation between (1, —1;) and 
(t, — t4) is 3. 

Writing g’? for , the results of the previous section may be ap lied. g’? increases as c; increases from 
a lower limit of 20/3 (when oè = 0) to an upper limit of c? (when gh = ©). In this situation the incomplete 
blocks procedure achieves a given probability of a correct choice more cheaply than if the block effects 
are constant. The analysis of the cup-tie procedure is not affected by the change in the model, since 
comparisons are made only within blocks and not between blocks. If a5 issufficiently small the incomplete 
blocks procedure will be cheaper than the cup-tie procedure. Comparing the costs tabulated in Tables 1 
and 2, g^? « 0-840? (05 € 0-810?) is sufficient for this to hold for all these values. 

Using the incomplete blocks procedure the minimax value of n for this model, assuming the same loss 
function as in the previous section is N 10 le- 00 057 l) and the maximum expected loss for this value of 
n equal to Nto’tcl(2-055). For the cup-tie procedure the maximum expected loss for the minimax n is 
Nicict(1-953) as before. This is less than that for the incomplete blocks procedure only if 


(2-055) > (1-958), 
which reduces to the condition 05» 1.006058. 


Values for the expected loss of the two procedures are given in Table 4, assuming the minimax n is used. 
Values are given for cf equal to 0:5, 1, 1-5, 2, 3. As g? tends to infinity, the expected loss tends to the 


values given in Table 3 for the incomplete blocks procedure. 
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Table 4. Expected loss of minimax procedures in units of Ni 


e |. of = 0-500? on = | of=1-5007| of = 207 oF = 30? — 
0 0-64 0-65 0-66 0-66 0-67 0-65 
1 1:29 1-30 1:31 1-32 1:32 | 1-29 
2 1-71 1-73 1-75 1-75 1-76 | 1-72 
3 1:89 1:93 1:95 1-96 1-98 1-93 
4 1-88 1-93 1:96 1-98 2-00 1-94 
5 1-72 1:78 1-82 1:83 1-86 1:79 
6 1:49 1:55 1:59 1-62 1-64 1-59 
7 1:25 1:31 1-35 1:37 1-39 1:35 
8 1-05 1:10 lj 1:12 1-15 1:17 1-14 
9 0-88 0-93 0-94 0-97 0-99 0-96 
| 10 0-77 0-82 0-83 0-84 0:86 0-84 


Note. The second decimal may be one or two points in error as the figures were obtained by graphical 
interpolation. 


Table 4 shows that the comparison made for the maximum holds fairly well over the entire range of 
OMNI ter. Ifo} = 0-5 0? the expected loss is less for the balanced blocks procedure than for the cup: tie. 
When g5 = g? the losses are very nearly equal for MB/ less than five, but the balanced incomplete 
blocks procedure is more economical for larger values of 0. For the larger values of 0/0? the eup-tie 
procedure is a definite improvement. 

Thus the comparison is in favour of using the standard experimental design only when o/ is less than 
one or one and a half. In practice this ratio will often be unknown and will have to be estimated. 
Unbiased estimates of 0, and 0,—6, can still be obtained by using weights estimated from the data 
(Kempthorne, 1953). However, the variances of the estimates will be increased by the use of estimated 
weights and more replications will be needed to ensure a given probability of a correct selection. There- 
fore, if o/ is unknown the advantage will probably still lie with the cup-tie procedure. 


The author is indebted to Dr N. L. Johnson for suggesting consideration of the cup-tie procedure and 
for his comments during the course of the work. 
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CORRIGENDA 


(1) Biometrika (1957), 44, pp. 532-3 
‘A note on the mean deviation of the binomial distribution.’ 


By N. L. Jonxsox 


Since the publication of this note I have received the following information on earlier 
work on the subject, of which I was unaware. 

Prof. O. Reiersol refers to p. 161 of an article by R. Frisch Solution d'un probléme du 
calcul des probabilités’ in Skand. Aktuar. Tidskr.7 (1924), giving a derivation of formula (2) 
of my note. ? 

Dr L. A. Aroian refers to p. 85 of C. Jordan’s Statistique Mathematique (1927) which 
quotes Frisch's result. 

Dr E. L. Crow refers to a paper by J. S. Frame — Mean deviation of the binomial dis- 
tribution', Amer. Math. Mon. 52 (1945), which gives an empirical deduction of formula (2) 
followed by an approximate formula for the mean deviation, analogous to that implied by 


my formula (4). N.L.J 
L. JOHNSON 


(2) Biometrika (1953), 40, pp. 116-27 
On the mean successive difference and its ratio to the root mean square’ 


By A. R. Kamat 
P. 117, line 16, equation (4). 


Read ER) =$ for &(d) -5e 


(3) Biometrika (1958), 45, pp. 211-21 
‘Moments of sample moments of censored samples from a normal population.’ 


By J. G. Saw 


I regret that a number of mistakes have occurred in Table 4, giving values of H;( p, a, 5), 
in the above paper. > 

On using 1 for further work, it became apparent that errors must exist when 
D, was 0-70, 0-75 and 0:80. On checking it was found that, for example, in iex o 
of (d), (equation (4-5)) an error had occurred for p, = 0-70; t = 2. Since (ds), (cd » Me 
(da), and (cad), were obtained using (d) this led to an e ik m ae 
mistakes in the p, = 0-70 group of Table 4. The errors for p, = d an S - 1 
similar but since they occurred higher up the a+b scale, the resulting ‘triangular’ sp g 


of errors was less. T i 
Considerable checking has been made in the tables and it is hoped that no serious errors 


remain. 


588 Corrigenda 
Corrected values of Hj p,, a, b) in Table 4 


i=4 


a=2; b=0. p,=0-70 4-0-52227 171 -- 0-61553 24 + 0-869914 
075 + 0775795 


@=3;b=0. p=070 | 4132555333 | 2101579 | + 296766 
0-80 +0-68542 12 
a=2; bzl. p,=0-70 — 107145 32 — 2-00819 7 — 2.90650 


a=0; bz3. p,=0-75 + 4932648 


0-70 4-2:30019 97 +5-069063 | + 916907 
0-75 73.550811 
0-80 4-116315 92 | 

a=3; b-1. p,=0-70 —1:90213 17 —4417150 | — 82653 
0-80 —0-93713 02 

a=2; b-3. p,=0-70 1.50575 16 +3:848057 | + 6-996046 

a=1; b=3. p,=0-75 + 0-87939 


a=0; b=4. p,=0-75 


a=4; b=0. p,=0-60 
+ 257-89569 
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REVIEWS 


The Mathematical Theory of Epidemics. By N. T. J. Baer. London: Charles 
Griffin and Co. 1957. Pp. 194. 36s. 


Thus book gives an up-to-date account of the base mathemetival theory pubem des. 
viter a brief historical aketch the book discusses first the deterministic approach to n 
row up between the two wars. This approach owes much to the pioneering series of papers eee 
ntly, between 1927 and 1939, by W. O. Kermack and A. G. MeKendriek. The next, and the most 
„portant, portion of the book concerns the introduction of probability theory into the basie models 
uling to stochastic theories of epidermics, This portion of the book must be studiat carefully by all 
bent of the subject, Much of the work here is of very recent origin and the author himself has been 
of the main contributors to the field. 


l'wo alternative types of model are considered here; the Reed-Frost formulation and Greenwood's 
formulation. Tho latter is slightly simpler to handle mathematically, since it assumes that the chance 
of infection is independent of the number of infectives available to tranamit the disease. The final part 
of this chapter considers the situation where the chance of infection does not remain constant but varies 
from individual to individual, 

Chapter 7 attempts to show how the earlier techniques can be bettered by using the observed varia- 
tions in the time intervals between successive cases. Chapter 8 deals with recurrent epidemics and 


resented the whole subj in an extremely logical and orderly way, pointing the direction for further 
vod lb abe fakes ad effective plea for more practical data to be collected in a form suitable for 
mathematical analysis. The standard of mathematics required for this book is high and it is not a book 
for the non-mathematical practitioner of medicine who would require a rather more practical form of 
approach. As a first venture into a new field the book is, however, excellent and is a very welcome 
addition to the range of statistical texts now available. P. d. MOORE 


Variation and Heredity. By H. Karus. London: Routledge and Kegan Paul Ltd. 
1958. Pp. 227. 28s. 


i i i i third of a new series 
Dr Kalmus’s book is concerned with the causes of human variability and is the 

(Survey of Human Biology ) which it is hoped will be useful ied ER fb 2 . 
interested generally in human affairs, specialists in medical or sciences, . 


Various ways in which human 
terms without details of statistical method. Dr Kalmus 


> chromosomal basis of inheritance (Mendelism) as evidenced 
He provides a very clear acoount of He istical pitfalls in interpretati of human data). Here, as 


very lucidly and pleasantly written. torte 4 orbe R = 
discusses 


x T H 2 thus 
as in extrachromosomal inheritance, developmental mechanics and quantitative inheritance, 
giving a fair picture of the still evolving state of genetical knowledge. 
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Besides fundamentals the book introduces us to several topics of great general importance such 
radiation dangers, AID, Eugenics, intelligence and mental defect. In excellent chapters on geographical 
variation and the genetical theory of populations, Dr Kalmus demonstrates the unscientific character 
of such a term as ‘race’, and explains the kind of genetic concepts which ought to form the context 
of realistic thinking about so-called ‘racial differences’. The price of unrealistic and unscientific thinking 
has been a heavy one for the world. There is a price to pay for unscientific thinking in other human 
affairs also, and a book like Dr Kalmus's, though it will probably be read mostly by people already 
rather highly educated, still has a part to play, because even the university graduate is far from being 
well informed as to the full complexity of human beings, and all of us nourish some prejudices or 


misconceptions acquired in our childish years. A. R. G. OWEN 


Sources and Nature of the Statistics of the United Kingdom. Vol. H. Edited by 
M. G. KENDALL. Edinburgh: Published for the Royal Statistical Society by Oliver 
and Boyd. 1957. Pp. 343. 30s. 


The second volume of papers on statistical sources in the United Kingdom, like the first, consists of 
reprints of articles by different authors on particular topies which had previously appeared in the 
Journal of the Royal Statistical Society. The original articles had been written at various times over 
a period of several years and were thus by the date of publication of the collection to varying extents 
out of date. The papers have, however, been brought more up to date by revision or, in many cases, 
by the addition of a short appendix on the more recent developments. The result is an invaluable 
collection of detailed discussions of the source material on a wide range of different subjects. 

The papers in this volume are, once again, divided into four groups—general surveys, commodities, 
transport and communication, and miscellaneous. Unlike the first volume, where more than half the 
articles were on particular commodities, the largest group here is made up of the general surveys. 
The papers in this group in the first volume, on censuses of production and distribution, oversea trade, 
agriculture, and labour statistics, were, however, on subjects which are important in a discussion of 
the general economic situation, whereas the papers in this volume tend to be on subjects with a more 
specialized interest. 

The authors of the papers are drawn partly from the civil service and the universities and partly 

from specialists on their particular topics in business and industry. Naturally the nineteen different 
authors use widely differing approaches to their subjects, but it is interesting to note that the authors 
from the civil service and the universities, with one or two exceptions, discuss the available sources 
with little or no reference to actual numerical data, whereas the other specialists, in general, use 
considerable statistical data to illustrate their discussion. An excellent example of the former approach 
is the article on food statistics by W. D. Stedman Jones; this discusses in considerable detail the methods 
of collection of food statistics and their meaning. While many of the methods discussed are no longer 
used because of the ending of the controls on which they were based, as is noted in the addition at the 
end of the article, the article will remain of very great value to those wishing to study food statistics for 
a very interesting period. The article on criminal statistics by Tom S. Lodge is one of the exceptions 
2: that, although by a civil servant, it contains several tables. Even here, however, the tables are 
clearly included, not for the intrinsic importance of the information they give, but as illustrations of 
the points which Mr Lodge wishes to make on the interpretation of the statistics. The great value of 
this article, once again, is the information it gives on the methods of collection of the statistics, on 
their meaning, and on the consequent difficulties in interpretation. 
In some of the other articles much greater use is made of the statistical data in themselves as, for 
instance, in the estimates of net saving through life assurance in 1949 in the article on the statistics 
A eei insurance by A. George Herbert and Roland D. Clarke. In one or two places elsewhere in 
i e idis is left with the uncomfortable suspicion that the data are being used to prove a point 
avourable to the author's viewpoint, rather than to inform the reader on the sources available, but 
such examples are commendably rare, 

The methods of classification of their subject-matter by a fow of the authors leads to repetition of 
pr description of some sources. This may improve the articles as works of reference, but it does not 

to one's pleasure when reading them as a whole. It is, in any case, difficult to remain interesting 


pies: icing on the sources of statistical data and it is to be feared that some of the authors have not 


P 
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One inevitable disadvantage in producing a book of this nature is that, while the costa of producing 
it are high and the potential market must be «mall, some of the articles contained in it are likely to 
become out of date very soon. It is to be hoped that the Royal Statistical Society and the publishers 
will be able to continue their good work by bringing out revised editions of the two volumes at intervals 
which are not too great. W. J. CORLETT 


Petrographic Model Analysis—An Elementary Statistical Appraisal. By Fxtux 
Cuayes. New York: John Wiley and Sons Inc.; London: Chapman and Hall Ltd. 
1956. Pp. xii-- 113. 44s. 


It is always interesting to a statistician to read of the application of his methods in a specialized field 
and this book is no exception. And while, apart from the first chapter, where some interesting pointa 
of geometrical probability are raised, the statistical content of the book is largely restricted to means 
and standard errors, the associated sampling problems, being in some respects special to petrography, 
make instructive reading. 

In the first instance the book treats of rocks, such as granite, which are composed of granules of 
different minerals, the percentage composition of which is known as the mode. A thin section of rock 
being taken, much like a histological section it would seem, the mode is determined either by means 
of a continuous line integrator along the lines of a rectangular grid, or by a point count made at the 
points of intersection of the lines of the grid. The first chapter deals with rocks in which the granules 
are randomly dispersed and establishes formally the proposition that the line integrals give unbiased 
estimates of the volumetric composition. The next treats banded rocks and the subsequent chapters 
are concerned with reproducibility, standard errors and other more technical questions. 

D. E. BARTON 


Non-parametric Methods in Statistics. By D. A. S. Fraser. New York: John Wiley 
and Sons, Inc.; London: Chapman and Hall, Ltd. 1957. Pp. x+299. 68s. 


This book is likely to be of considerable value to theoretical statisticians who are familiar with the 
advanced textbook by Cramér or that by Kendallor, preferably, with both. (The author's intention that 
it shall serve as a direct sequel to Hoel’s intermediate book seems to me almost ludicrously optimistic, 
at least as far as British students are concerned.) Like Cramér's book, it contains its own introduction 
to the measure-theoretic methods it uses, but this is much too compressed for the newcomer to measure 
theory. The first two chapters (124 pages) are a fairly comprehensive introduction to the abstract ideas 
of present-day theoretical statistics, not confined to non-parametric problems. Chapter 3 is a ten-page 
introduction to the latter. Chapter 4 supplements Chapter 2 in the treatment of the problems of 
estimation and of tolerance regions, but there is no such supplement for the confidence region problem, 
which gets only a cursory six-page treatment in Chapter 2. Chapters 5-7 are the core of dead borea 
dealing with hypothesis testing, limiting distributions and large-sample properties of testa, eavily 
oriented towards non-parametrie procedures. f 

Despite this piss the 1 8 by no means exhausts discussion of non-parametric problems: 
goodness-of-fit problems are not properly discussed, and although the distribution of — 
derived for tolerance interval purposes, it is not used to set the closely related confidence inte 


for percentiles. : j n. 
13 fact, the book is directed to the abstract theory, and will be of little value to the user of statistical 


i i i be called Annals-rebarbative: how 
methods, more particularly as the mathematical style is what may led. t 0 
many statisticians who know them will unhesitatingly supply the three missing words in the following 
characteristic definition from page 17? 
‘A statistic y =t(x) is...for the family of probab 


a function P(A|t) such that 
PGA I „Ah 0 


if there exists a determination of the conditional distribution, given t(x), 


measure for t(x) induced from the measure Pg over ge d 


ility measures (Ps|0 €Q} over Z (A) if there exists 


for all A V, Be; that is, 
which is independent of 0. Pj is the 
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This is heavy weather. The corresponding definition in Rao’s well-known book is, with the E 
three words omitted, — 
The necessary and sufficient condition that a distribution admits...is that the probability density 


can be written in the form $-4(T, 0), (ber, 


" 


. 


where Q(T,, 0) is the density of the statistic T and ¢, (zy, ---, %,), the density of the sample given T, — 
is independent of 0.’ 

A few minor cautions are necessary to the student using the book. The statement on pages 2-3 
needs to be remembered in the statement of some of the theorems later in the text. The statement, on 
page 144, that the k-statistics are minimum-variance unbiased estimators of the cumulants, and on. 
page 145, that the sample mean difference is the minimum-variance unbiased estimator of the popula- 
tion mean difference, are ‘non-parametric’ results in the sense that they do not hold if particular dis- 
tributional forms are considered e. g. the mean of the rectangular distribution and the mean difference 
of the normal distribution. 

There is a fair number of trivial misprints which will cause no difficulty, and a very large number 
of problems for solution, many of them integrated with the text. The index is inadequate for a book 
likely to be of value as a reference source: authors are not indexed, although theorems are labelled 
by authors’ names. A. STULBE 


Wahrscheinlichkeitstheorie (Band LXX XVI of Die Grundlehren von Mathematischen 
Wissenschaften). By H. RrommER. Berlin: Springer-Verlag. 1956. Pp. 435. DM. 66. 


This book is concerned to develop the theory as much as the calculus of probabilities and to do this 
abstractly as a mathematical discipline based on measure theory with an axiometric basis similar to 
Kolmogoroff’s. About one-third of the book is concerned with set theory and derived integration 
theory. Another third deals with the concept of probability as an induction from experience and the 
consequences of this in regard to the axiomatics, together with the general development of the set of 
axioms defined here. The rest of the book develops what may be distinguished as the calculus of 
probabilities; transformation of variables, moments and characteristic functions, limit theorems, 
standard distributions, central limit theorems, etc. As a consequence of the heavy development of the 
earlier chapters these are dealt with very thoroughly, but this much restricts the range of problems 
covered. 

"The author says that the book is designed for mathematies students and that, in effect, it will also 
act as a textbook for set and integration theory. However, while he is at pains to point out the inclusion 
of the more easily comprehended classical elementary theory in his general development, it may 
wondered whether the student who has read it will have a more flexible and understanding grasp of 
probability theory than one who has read a more down-to-earth development of half the length. 

p. E. BARTON 


Statistica. By Francesco Bramsitta. Milan: La Goliardica, Vol. 1. La variabilità 
strutturale. 1955. Pp. 672. Vol. 1. La teoria della stima. 1956. Pp. 688. L. 9000. 


I are textbooks of modern statistical mathematics and techniques written by an Italian fon a 
eee The books are excellent and show evidence of wide reading in all the availa! i 
F rature. The standard of exposition is high. On the whole, English students of statistics will p 
e SEL from these books than they can already get from a combination of Yule & Kendall's 
Sey whe 6 LUN e Advanced Theory. There is, however, a certain freshness about well-known 
ee N in a foreign language and for those wishing to practise their Italian 
Sr 1 and 1 are not sold separately. They are printed by some kind of off-set process wn 
es the price seem rather startling. A third volume La Teoria della Inferenza is promised. 
r. x, DAVID 
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Probability, An Intermediate Textbook. By M. T. L. Brzugv. Cambridge: Published 
for the Institute and Faculty of Actuaries at the University Press. 1957. Pp. viii 
230. £1. 


This book gives an elementary treatment of the simpler problems of classical probability theory. The 
logical sequence of the development of the subject necessarily follows, initially, the well-worn path: 
rules of probability, elementary combinatorial problems, binomial and hypergeometric distributions. 
These are followed by Bayes's theorem and what may be called the Generalized Addition Law for 
probabilities of independent eventa. This is rather oddly attributed to Waring (1792), though it waa 
clearly stated by De Moivre (1724) and in fact, for the case of three events, by Halley (1693) who gave 
the geometrical picture of the theorem beloved of modern set theorists, Next follow chapters on 
expectations, on problems which may yield their solution as a difference equation and on simple runs 
of two alternatives, Finally, there is a chapter which introduces probability density functions by 
means of the intuitively plausible concepts of geometrical probability. 

The book is apparently a course-book designed to cover a specific syllabus and this vitiates ita appeal 
to the general reader, since the topics covered are selective rather than encyclopaedic. On the other 
hand, it faces the limited mathematical attainments of its intended readers with ingenuity and the 
problems are, considering this limitation, surprisingly representative. Sometimes the ingenuity i& 
restrictive however; for instance, it would be hard to derive the normal distribution plausibly or 
instructively from a situation couched in terms of classical geometrical probabilities. 

A more serious criticism of the logical development of the book arises when the first chapter, on the 
nature of probability, and the appendix, on theories of probability, are considered. The author's own 
opinions are not clear: he would seem to subscribe to the ‘principle of insufficient reason’, which is 
hard to distinguish from the most subjective of degrees-of-belief theories, and to the views of Perks 
which many will consider to introduce an unnecessary ambiguity into the words ‘equally likely’ with 
very little gained. Apart from these points he gives an ear more or less impartially to the different 
schools of thought without having adequate space to deal with the contradictions between them. It 
may be doubted whether such treatment will help the student to a clear idea of what he is doing when 
he uses probability methods even if he understands, in a rather potted form, something of the different 
theories, There is in any event a danger that the student will accept such highly abridged versions as 
the last word, to the permanent detriment of his understanding. Further, it is particularly surprising 
that, in a book for actuarial students where such topics are broached, more emphasis is not laid on 
frequency theories. It would appear to a non-actuary, such as the reviewer, that the insurance 
companies, if no one else, would stand or fall according as to the frequencies with which the various 
contingencies of life occur and to the degree to which we may predict future frequencies from past 
records, D. E. BARTON 


Statistika Metoder. Vols. 1 and H. By H. Hyrentus. Göteborg, Sweden: Gumperts. 
1957. Pp. 625. Swedish Kr. 42 (£2. 18s.) 


This is a very elementary book about statistics for non-statisticians knowing hardly any mathematics. 
It is in three parts (previously issued as stencilled volumes) the first two being contained in Vol. 1 of 


the present binding. f R 
The first deals with the processing of data: tabulation, punched cards, graphical representation, the 
computation of means, standard deviations, x? (for contingency tables), correlation and regression 


coefficients, curvilinear and partial regression and correlation coefficients. There is a wealth of worked 


examples from medieal, biological, sociological and economic fields. This part ends with an uncommonly 


sensible commonsense treatment of economie time series. : 

The second part outlines the theory underlying the first, the results being for bee part ee 
without proof, but the concepts and definitions being covered by ample, if roug] . : 
illustration. In this way a very large amount of ground is covered : elementary prol yu zh 
the binomial, Poisson, multinomial, and uni- and bi-variate normal distributions; graduation y 
Pearson and Gram-Charlier curves; maximum likelihood estimation, confidence iiir "eem 
testing and power curves, the t- and F-tests and their distributions; sampling variances an! expect 


theory, all being treated. The third part covers in more detail particular testing situations, namely, 
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those of contingency, goodness-of-fit, regression, simple analysis of variance, curvilinear regression and 
ranking. 


It is an excellent book of its kind and one feels not only that the medical man or biologist who follows 
it through will have a wide variety of elementary statistical techniques at his disposal but that he will 


be enabled to use them intelligently. D. E. BARTON 


A Course in Multivariate Analysis. (Griffin’s Statistical Monographs and Courses, 
no. 2.) By M. G. KENDALL. London: Charles Griffin and Company Ltd. 1957. Pp. 185. 
228. 


An Introduction to Multivariate Statistical Analysis. By T. W. ANDERSON. New 
York: John Wiley and Sons Inc.; London: Chapman and Hall Ltd. 1958. Pp. xii+ 
374. 100s. 4 


Some Aspects of Multivariate Analysis. (Indian Statistical Series, no. 1.) By 
S. N. Roy. New York: John Wiley and Sons Inc.; Calcutta: Indian Statistical ` 
Institute. 1957. Pp. viii+214. 64s. 


These three books, published almost simultaneously, are the first to appear that deal exclusively and 
broadly with multivariate analysis. There is little overlap between them. They complement each other; 
each is excellent in its way. 

Prof. Kendall’s book is the revised version of a set of lecture notes for a course given at the University 
of North Carolina, Raleigh, and also at the Virginia Polytechnic Institute, Blacksburg, in 1954, and 
again subsequently at the London School of Economics. The author says in the Preface: ‘Multivariate 
Analysis in statistics is apt to be a baffling subject, especially for those students who want to use it in 
solving practical problems but do not possess the time or the inclination to plumb the depths of the 
mathematical theory to which it leads. This course was prepared with practical applications very much 
in the foreground. In it I have tried to expound the essential concepts and techniques and have limited 
the mathematical treatment as much as possible. In the present stage of knowledge this is no loss. 
The analysis of multivariate material requires to an unusual degree that peculiar blend of insight and 
skill in probabilistic interpretation which characterises the statistician, and for which pure mathematics 
is no substitute.’ 

Prof. Kendall has achieved his aim with the skill that one would expect from him. He begins by 
explaining the notion of analysing a sample of observations taken from a multivariate population into 
its principal components. This leads to a discussion of factor analysis and the estimation of functional 
relationships, and then an account of canonical analysis. A descriptive account (without proofs) of 
some relevant sampling theory and other mathematical topics follows, some multivariate tests gene 
ralizing those of ordinary analysis of variance are given, and finally there is an account of discriminatory 
analysis. All the main concepts and methods are illustrated by examples drawn from the literature— 
over twenty of them. These examples, with critical discussion, are the main strength of the book. They 
provide an answer to anyone who may ask: what is multivariate analysis all about, what is the use of it? 
Most of the discussions are persuasive. Your reviewer was particularly impressed by the beautiful 
treatment of a problem of regression with collinearities (pp. 71-3). At the other extreme, he was 
particularly unimpressed by an application of covariance analysis (p. 135)—but then, anyone who 
honestly tries to discuss multivariate analysis in the round, and not only the mathematical theory of it, 
can hardly fail to provoke dissent occasionally ! 

The book is easy to read, and makes only modest demands on the reader’s mathematical and 
statistical knowledge. It is likely to be particularly welcomed by persons who have already some slight 
or one-sided acquaintance with multivariate analysis and desire to see the whole subject in perspective 
and have key references. The text is photolithographed from typescript, and the cover is soft. 

Prof. Anderson’s book is also introductory, but with a different aim—not an appraisal of multivariate 
methods, but a coherent presentation of the mathematical theory. "This book has been designe 
primarily as a text for a two-semester course in multivariate statistics. It is hoped that the book be 
also serve as an introduction to many topics in this area to statisticians who are not students and V? 
be used as a reference by other statisticians. For several years the book in the form of dittoed notés. 
has been used in a two-semester sequence of graduate courses at Columbia University... . It is hope: 
that the more basic and important topies are treated here, though to some extent the coverage is 
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a matter of taste. Some of the more recent and advanced developments are only briefly touched on 
in the last chapter. 

The book has the appearance of being the fruit of many years’ labour. It is most carefully written, 
and every argument is set out in full detail and with elegance. The first half of the book (corresponding 
to the first semester's course) treats of the following topics: the multivariate normal distribution, 
maximum likelihood estimation of the mean vector and covariance matrix, the sampling distribution 
of correlation coefficients, the Tu statistic, and classification problems. The second half deals with; the 
Wishart and related distributions, various significance tests derived by the likelihood ratio principle, 
principal components, canonical analysis, and some distribution theory of characteristic roota in the 
null case. There is also the final review chapter mentioned above, an appendix on matrix theory, and 
a fine bibliography. An amusing indication of the difference in intention between this and the 
book is their treatment of that old sore thumb, factor analysis. Prof. Kendall, trying to exhibit what 
people do, gets to the subject early and does the best he can for it. Prof. Anderson, trying to present 
a clear and not-too-difficult body of theory, relegates it to a niche among the recent and advanced 
developments. 

Undoubtedly Prof. Anderson's book will long remain the standard textbook and work of reference 
for multivariate theory based on the normal distribution. If such a beautifully written and beautifully 
printed book can have any fault—of being ever so slightly too smooth—what a good fault, and how 
easily remedied by reading either of the other two works under review ! 

About Prof. Roy's book, one is tempted to say that it begins where Prof. Anderson's leaves off. 
Literally that is untrue, as it is self-contained and presupposes no previous knowledge of multivariate 
theory. But it can certainly be described as a recent and advanced development, and its main object 
is outside Anderson's scope. This monograph does not by any means attempt to cover the entire area 
of multivariate analysis, or even a major part of it. Aside from certain basic notions and results due 
to Fisher, Hotelling, Mahalanobis, Karl Pearson, Wilks, Wishart, Yule and some of their predecessors, 
which have now become current coin, this monograph is primarily concerned with those developments 
in multivariate analysis in which the author has been specially interested and with which he and some 
of his collaborators have been associated over several years. Part of the material presented here, as 
far as the author is aware, has not been published before, while the rest has been collected from papers 
by various workers in this sector including the author and his collaborators. It will be seen that in 
the monograph the statistical approach to different problems and the mathematical treatment of all 
such problems are uniform and perhaps somewhat individual, and that this applies to all specific results, 
no matter whether they are due to the author and his collaborators, or to other workers in 
the field or to both groups simultaneously." 

Prof. Roy begins by enunciating a principle of test construetion somewhat different from the usual 
likelihood ratio method. (Anyone who can be saying something interesting about tests by as soon as 
page 6 of his book deserves applause.) With this he chooses test criteria for a number of standard 
normal-theory null hypotheses. T'he criteria can be expressed in terms of the characteristic roots of 
a determinantal equation—sometimes the largest root, sometimes both largest and smallest roots. 
Associated with each test are measures of departure from the null hypothesis, based on a concept of 
distance; in terms of these the author obtains bounds for the operating characteristics of the tests. 
He is then able to achieve his main objective, namely, to derive by inversion confidence bounds for 
these measures of departure from the null hypothesis. There is also a final chapter concerned with the 
analysis of categorical data (contingeney tables). Work is evidently still in progress, and there are 
various references to further results to be presented in another monograph or in a second edition of this 


one. The book will interest many whose concern is theoretical research in statistics. 
F. J. ANSCOMBE 


Statistical Exercises. Part II. Compiled by N. L. Jonwsox. Issued by Department of 
Statistics, University College, London. 1957. Pp. 107. 12s. 


Even if one keeps a careful record of the details of problems dealt with in statistical consulting work, 
one frequently finds that there is a dearth of suitable examples for teaching purposes. Research workers 
can often cope with the analysis of standard situations themselves, and are liable to consult a statis- 
tician only when some awkward deviation from the textbook pattern occurs. For this reason a graded 
series of standard exercises is always welcome. P 
The present compilation by Dr N. L. Johnson is divided into four main sections. The first deals with 
the simpler analysis of variance techniques, and starts with tests for equality of variances and the 
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transformation of variables. Such matters as Latin squares, missing plot technique and eross-classi- — 
fication with unequal frequencies are included here. e 

Next, we have a set of exercises on factorial experiments. Examples are given on confounding, — 
fractional replication, split plots and incomplete blocks. > 

The third section covers correlation and regression, including multiple and curvilinear regression, 
A useful feature is the provision of notes on the Choleski method of inverting a symmetric matrix, 

Finally, there are examples on a number of miscellaneous methods such as dosage-mortality tech- 
niques, discriminant functions, time series and curve-fitting. 

These exercises have been excellently chosen and are likely to be of great use in teaching the appli- 
cation of statistical methods, though some of the examples are probably too long for use in a practical 
class. However, a student taking a course in statistics wants the opportunity to work through a good 
range of different problems without having too much repetition, and without having to spend too 
much time on investigating the influence of awkward complications in the data. 

This volume makes a most welcome addition to the set of exercises on more elementary matters 
previously issued by the Department of Statistics at University College London. Students of statistics - 
and their harassed teachers are now no doubt eagerly awaiting the supplement, promised in the preface, 
containing worked solutions or suggested methods of attack! NORMAN T; % BARRE 


Tables of the Non-Central t-Distribution. By G. J. Rusnrxorr and G. J. LIEBERMAN. 
Stanford, California: Stanford University Press. 1957. Pp. 389. $12.50. 


Possibly the simplest application of the present tables lies in the problem of estimating the proportion, 
P, of a Gaussian population which exceeds a given fixed limit L. This proportion depends only on the 
ratio U=(L—p)/o, which may be estimated from a sample of n observations by the statistic 
u=(L—2)/s. It may be shown that vn u can be expressed as the ratio of (z+ ô) to yw where à = Vn U, zis 
a unit normal deviate and w is distributed independently as y*/f with f =n — 1 degrees of freedom. Sucha 
quantity has been termed a non-central t variable and this seems to be the natural extension of the usual 
way in which the central ‘Student’ f= ai is now defined. However, in tabulating the non- central distri- 
bution, with its practical applications in view, one soon finds that it is not sufficient to consider only 
moderate values of t. In the problem just mentioned, for instance, the non-centrality parameter ô 
is proportional to Vn and increases with the sample size. Since the mean of the „distribution is ap- 
proximately d, the major part of the distribution is not therefore confined (as it is in the central case) 
to moderate values. Accordingly, to secure more compact tabulation, Resnikoff and Lieberman have in 
the present volume used the argument fed. Against this quantity at intervals of 0-05 they tabulate” 
both the probability density and the probability integral of t to four decimal places. This information 
is provided for 280 separate distribution curves defined by 28 values of f, viz. f= 2 (1) 24 and 24(5)49,. 
each associated with 10 values of dor rather 10 values of (f + 1)-3, viz. 0-674490, 1.036433, 1-281082, 
1-514102, 1.750086, 1-059964, 2-326348, 2-052070, 2-807034 and 3090232. It will be observed that 
these latter are standard multiples corresponding to simple fractions of the Gaussian distribution and 
indeed, in application to the problem described above, correspond directly to P = 0-25, 0:15, 0 10, 0-065, 
0-04, 0-025, 0-01, 0-004, 0-0025 and 0-001. } 

Ina short table at the end of the volume certain percentage points of the 280 distributions are also 
provided, obtained by inverse interpolation from the probability integral. In this connexion 
authors state that, where comparisons are possible, there is satisfactory agreement with the percentage 
points tabled by Johnson and Welch in Biometrika, 34. (These were calculated by means of ee 
asymptotic expansion from the starting point that (t—8)(1+/2f)-4 has approximately a stances 
Gaussian distribution). 

In their introductory matter Resnikoff and Lieberman describe the applications of their tables to 
the familiar problems associated with the estimation of proportions of a Gaussian population and with Ry 
the calculation of power functions of statistical tests. They include also reference to sequential ratio 
3 v the caleulation of certain expectations of functions of t which have occurred in the literature 
of sampling inspection. , 

Comprehensive tables of the density function and of the probability integral of the non-central 
t-distribution have not been given before, and the volume under review represents a most valuable 
advance in the tabulation of a function which confronts the numerical worker with a number of rather 
difficult problems. B. I. WELOH 
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Statistické Tabulky (Statistical Tables). Edited by Janostav Jaxxo.. Prague: Cesko- 
slovenská Akademie Véd (Czechoslovakian Academy of Sciences), 1958, Pp. 251. 
Price 18.20 kés. 


This publication contains forty statistical tables, occupying 140 pages. There are 108 pages of contenta 
index and introduction, and, at the end of the book, 4 small glossary index, giving English and Russian 
equivalents for Czech statistical terms used in the introduction. 

The method of arrangement in groups of related tables follows that used in Biometrika Tables for 
Statisticians. The format is similar, too, so that, particularly when looking at the introduction, there 
is a feeling of familiarity which unfortunately disappears on trying to read the text. 

Tables 1-8 (Normal, x? and t distributions) are more or less familiar to most statisticians. The Normal 
tables are rather more concise than usual, and there are no dosage-mortality tables, but the tables 
of x* significance limits (based on the work of A. Hald and S. A. Sinkback) are unusually extensive, 
and there is also a table of significance limita for X* (degrees of freedom). 

Tables 9-13 (non-central t, central and non-central F distributions) include charts of power functions 
of certain t- and F. tests, and are reproduced from Biometrika sources, as are also Table 15 (charts for 
confidence intervals for the correlation coefficient), Tables 19-22 (distribution of range and of student- 
ized range) and Table 25 (studentized extreme deviate). Also, Tables 29-31 (Poisson distribution) 
have close parallels in Biometrika Tables for Statisticians. 

Table 14 gives percentage points of the distribution of correlation coefficient reproduced from Fisher 
and Yates's Tables, while Table 32, giving values of 2 sin-* /x is rather more extensive than, but 
similar to, another table in this publication. 

Table 16 gives tables of significance limits for, Cochran's criterion (max 6%½%/ Let), reproduced from 
Selected Techniques of Statistical Analysis; Table 35 (tests for randomness of grouping in a sequence) 
also comes from this source. Other tables from American sources include Tables 23-27 (outlier criteria) 
and 34 (tolerance limits for a Normal distribution). 

Tables 17-18 reproduce Hald's tables for estimating parameters of truncated and censored Normal 
distributions. Table 28 gives confidence limits for a binomial proportion—an unusually extensive table 
and Table 33 distribution-free limits for the median (acknowledgement is made to K. R. Nair). 

The final group of tables (37-40) from American, Hungarian and Russian sources, give significance 
limits for various ‘Kolmogorov’ criteria, based on comparisons of observed with theoretical cumulative 
distribution funetions, or with each other. 

So far as could be gleaned from a very rough translation, the introductory text gives categorical 
directions for applying the various techniques, with but little scope for individual judgement. However, 
most English-speaking readers will value the book mainly for the special tables—particularly nos. 8 
(x? + (degrees of freedom)), 16-18, 28, 34 and 37-40— contained in it. The quality of paper and binding 
is not as good as would be desirable in a volume which is likely to be handled fairly frequently. There 
are also a number of misprints—rather more than usual in a book of this type. Examples are: 
A(t, to, €) for A(f, to, €) on p. 135, several misplaced decimal points in Table 17, and ¢,, for t, or fj, on 
pp. 39-40 in the introduction. However, none of the misprints found were really misleading, or more 
than minor nuisances. N. L. JOHNSON 


Tables of Integrals and other Mathematical Data (third edition). By HERBERT B. 
Dwiaur. New York and London: The Macmillan Company. 1957. Pp. 288. 21s. 


For those research workers in subjects where mathematical techniques are used as a tool, as a means 
to an end instead of an end in itself, this third edition of condensed information is valuable. Many 
statisticians use and value the tables of Smithsonian mathematical formulae, and this present book, 
which was first in the field, covers in algebra and trigonometry much the same ground. The supple- 
mentary numerical tables, given in most cases to only four decimal figures, are useful for quick 
calculations. These tables are of the usual trigonometric functions, the hyperbolic functions, natural 
logarithms and Bessel functions. ae i 
The book is useful for reference purposes and should be bought by statistical libraries. | 
F. N. DAVID 
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Fractional Factorial Experiment Designs for Factors at Two Levels. U.S. Dept. 
of Commerce: National Bureau of Standards, Applied Mathematics Series no. 48. 


1957. Pp. ii-- 85. 50 cents. 


It is well known that the use of factorial designs in experimental work may involve the collection of 
a very large number of observations (or measurements). It is natural therefore that attention should 
be paid to the possibility of the collection of fewer measurements, and the statistical technique which 
has emerged is that known as ‘fractional’ factorial experimental designs. This booklet lists 125 experi- 
mental designs of this kind. It will be useful to everyone concerned with this particular type of 
statistical analysis. F. N. DAVID 


Vector Analysis. By L. BRAND. New York: John Wiley and Sons, Inc.; London: 
Chapman and Hall Ltd. 1957. Pp. 299. 48s. 


This book gives the reader an excellent introduction to vector algebra, vector geometry and vector 
calculus and this introduction is complemented by three well-thought out chapters on applications of 
these topics to dynamics, fluid mechanics and electrodynamics. 

The first five chapters form the content of a good course on vector analysis for physicists and 
chemists, and also give the applied mathematician an insight into this valuable tool at the level of 
the first or second (honours) year. The book work here is clear and the emphasis sound: it includes 
kinematics, differential geometry and the integral theorems of Green and Stokes. Interlaced with the 
text are relevant and not too taxing examples. One imagines that a worked example followed by 
exactly the same question in the problems is a minor aberration in documenting the material: see 
example 1 on page 27, together with problem 3 on page 29. 

The applications are necessarily less scholarly, since they treat complex physical disciplines in all 
too brief chapters. Dynamics occupies 18 pages and we get little more than statements on rigid body 
problems. The section on the solar system is pleasing. Electrodynamics treated in Lorentz-Heaviside 
(the 47's occasionally disappear as on page 211) and Georgi units is again too short except as a revision 
course in electrostatics and magnetostatics. d 

The final chapter is a good attempt at introducing the young student to the ideas of linear vector 
spaces including Hilbert space at an early stage in his career. E. A. POWER - 


Linear Algebra for Undergraduates. By D. C. Munnoch. New York: John Wiley and 
Sons Inc.; London: Chapman and Hall Ltd. 1957. Pp. xi 239. 44s. 


This textbook has been written for students reading mathematics and the physical sciences. They are 

supposed to have reached a standard something like the G.C.E. Advanced Level in this country, but 

us 2M to be ready for the mathematically sophisticated point of view taken in modern abstract 
ra. 

The topies covered are best described by listing the chapter headings: 

(1) Vectors and vector spaces. 

(2) Matrices, rank, and systems of linear equations. 

(3) Further algebra of matrices. 

(4) Further geometry of real vector spaces. 

(5) Transformations of co-ordinates in a vector space. 

(6) Linear transformations in a vector space. 

(7) Similar matrices and diagonalization theorems. 

(8) Reduction of quadratic forms. 

(9) Vector spaces over the complex field. 

One has the impression throughout that the text has beeen carefully written and clearly displayed: 
and that it reflects the experience of a good teacher. As some of the chapter headings indicate, there 
is a development by stages which match the reader's growing feeling for the subject. 

Vectors are introduced as n-tuples of numbers; the abstract definition is explained in one of two 
appendices. Determinants of order n are defined and used, but the reader is referred elsewhere for the 
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set of simple properties related to the performance of elementary operations on the rows and cobumes. 
Three-dimensional . 

position veetors are much used for illustration; for those who need it, the second 
appendix bricfly seta out the elements of co-ordinate geometry of three dimensions. 
r a a T 
tforwi sometimes indi further theoretical ideas, 
2 1 ee a 


Numerical Analysis. Vol. v1. (Proceedings of 6th Symposium in Applied Mathematics 
of the American Mathematical Society held 1953.) Edited by Jonx H. Cumrns. 
New York, Toronto and London: McGraw Hill Book Company Ine., for The American 
Mathematical Society. 1956. Pp. 303. 73s. 


This volume contains nineteen papers out of the twenty-one originally given at the Symposium on 
numerical analysis held at Santa Monica in August, 1953. The papers range widely in subject-matter, 


approximation to the normal integral serviceable for all positive values of the argument. A paper 
by Sard, on function spaces and approximation, bears on integral representations of remainders. 


eigenvalues of symmetric integral equations when obtained by numerical quadrature, and three 
(Bergman, Clutterham and Taub, Frankel) on different problems of numerical solution of partial 
differential equations. r 

The book is well produced and has a good index. T. LEWIS 


Note regarding 
Bibliography of nonparametric statistics and related topics 


Dr I. R. Savage is arranging for a revision of this Bibliography which was published in 1953 in the 
Journal of the American Statistical Association, 48, 844-906. Material up to and including publications 
of the year 1959 will be incorporated and it is planned to lay more emphasis on applications than before. 
References (particularly to literature not in the English language), reprints and technical reports on 
the theory or applications of nonparametric statistics would be greatly appreciated. Also, corrections 
and additions to the original Bibliography are desired. 
Material should be sent to: v 
I. RICHARD SAVAGE, 
Statistics Department, University of Minnesota, 
Minneapolis, Minnesota, USA - 
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Multivariate Correlational Analysis. By P. H. DuBois. New York: Harper and Brothers. 1957. 
Pp. 202. $4.50. 


Queues, Inventories and Maintenance. By P. M. Morse. New York: John Wiley and Sons Inc.; 
London: Chapman and Hall Ltd. 1958. Pp. 202. 52s. 


The Upper Confidence Limits for the Failure Probability of Complex Networks. (Sandia 
Corporation Research Report 4133 (TR).) Washington, USA: Offiee of Technical Services, 
Department of Commerce. 1957. Pp. 39. $1.25. | 


Further Contributions to the Solution of Simultaneous Linear Equations and the Determina- 
tion of Eigenvalues. (Applied Mathematies Series no. 49.) Washington, U.S.A.: National 
Bureau of Standards. 1958. Pp. 81. 50 cents. 


Integrals of Airy Functions. (Applied Mathematics Series no. 52) Washington, U.S.A.: National 
Bureau of Standards. 1958. Pp. 28. 25 cents. , » 


A Treatise on the Analytic Geometry of Three Dimensions. Reprint of seventh edition, Vol. 1. 
By G. Sarmon. New York: Chelsea Publishing Co. Pp. 470. $4.95. 


Calculus of Finite Differences. Reprint of fourth edition. By G. Boore. New York: Chelsea 
Publishing Co. Pp. 336. $4.95. A 


Morphological Integration. By E. C. Orson and R. L. MHR, Cambridge University Press. 1958. 
Pp. 317. 78s. 


Atomic Energy Levels as Derived from the Analyses of Optical Spectra. Vol. Hr. By CHARLOTTE 
E. Moorn. Washington, U.S.A.: National Bureau of Standards. 1958. Pp. 283. $2.50. 


On the Dynamics of Exploited Fish Populations. By R. J. H. Brverron and S. J. Horr. Fisheries 
Division of Food and Agricultural Organisation of the United Nations, Rome. (U.K. Sales Agent, 
H.M.8.0.) 1957. Pp. 530. £6. 6s. Od. 


B } iam 
Genetics and the Improvement of Tropical Crops. By Sir Josern HUTCHINSON. Cambridge 
University Press. 1958. Pp. 30. 3s. 6d. " f 


Experimental Design in Psychology and the Medical Sciences. By A. E. MAXWELL. London: 
Methuen and Co. Ltd. 1958. Pp. 147. 17s. 6d. bo 


Measurement of Levels of Health. World Health Organisation Technical Report no. 137. W. H. O., 
Geneva, Switzerland. (U.K. Sales Agent, H.M.8.0.) 1957. Pp. 29. le. 9d. 


Probleme der Statistischen Methodenlehre in den Sozialwissenschaften (third edition). By 
O. ANDERSON. Würzburg, Germany: Physica-Verlag. 1957. Pp. 358. DM. 24.50. 


Non-group Enrollment for Health Insurance. By S. Levine, O. W. AxpzRSoN and G. GORDON: 


veria Mass.: Harvard University Press. London: Oxford University Press. 1958. Pp. 171. | 
8. 


Economics as a Science. By A. G. PAPANDREOU. U. S.A.: J. B. Lippincott Company: Pp. xi- 148. 
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